\documentclass[review=false, screen]{ocsmnar} % Only used to typeset XeLaTeX-Logo below. \usepackage{metalogo} % Adjust this to the language used. \usepackage[british]{babel} % use nice mathematical symbols \usepackage{amsfonts} \usepackage{adjustbox} \newcommand{\R}{\mathbb{R}} \newcommand{\Oc}{\mathcal{O}} \begin{document} \title{Anomaly Detection in Wireless Sensor Networks: A Survey} \seminar{SVS} % Selbstorganisation in verteilten Systemen \semester{Sommersemester 2020} \author{Anton Lydike} \affiliation{\institution{Universität Augsburg}} \begin{abstract} Anomaly detection is an important problem in data science, which is encountered often when data is collected and analyzed. An anomaly is often defined as a measurement that is inconsistent with the expected results. Since anomaly detection can be applied to many different environments, a multitude of different research contexts and application domains exist in which anomaly detection is researched. Anomaly detection in wireless sensor networks (WSN) is a relatively new addition to the field, where a lot of active research is done and new methods are proposed regularly. The context of WSN introduces a lot of interesting new challenges, as nodes are often small devices running on battery power and cannot do complex computation on their own. Furthermore, in WSN communication is often not perfect and messages get lost during operation. Any protocols that incur additional communication must have a good justification, as communication is expensive. All these factors create a unique environment, in which not many previously existing solutions to the problem are applicable without adaptation. This survey will focus on four different types of anomalies, and then look at two fundamental problems related to anomaly detection. First, sensor self-calibration is explored as a method to improve sensor reliability. Next, different methods of detecting outliers are looked at and evaluated. Here, conventional model-based approaches such as statistical or density based models are covered first. Afterwards the newer approaches of machine learning based models to outlier detection are investigated. In the end, all approaches presented in this paper are tabulated and evaluated based on communication overhead, requirements of prior knowledge, centralization, required network topology and more. \end{abstract} \keywords{Wireless Sensor Networks, Anomaly detection, Outlier detection, Sensor calibration, Drift detection} \maketitle \section{introduction} A Wireless Sensor Network (WSN) is commonly defined as a collection of battery powered nodes, which communicate using a low-bandwidth and low-power wireless transceiver. Each node contains an array of sensors and collects data on its surroundings. This offers a versatile platform, that can be deployed to perform various tasks, such as monitoring a wide range of physical or environmental conditions, e.g. temperature, humidity, pollution, noise, motion and more \cite{xie2011anomaly}. They can also be deployed over large areas at a comparatively low cost and even track the behavior of animals \cite{cassens2017automated}. The environment they are deployed in also imposes restrictions on nodes, for example to be lightweight and/or relatively cheap. In most cases, it is preferable to prolong the lifetime of each node as long as possible. The power required to transmit data is often the largest contributing factor to the lifetime of each node, as it drains the battery \cite{sheng2007}. Especially if the network collects large amounts of data, or spans large areas, a lot of energy can be saved by reducing the number and size of the transmissions. An Ideal solution would be to not send the unimportant data at all, thus arises the need for anomaly detection in WSN, enabling nodes to identify important data themselves. This is however not the only factor why anomaly detection is interesting. Some WSN are deployed to detect phenomena such as forest fires \cite{hefeeda2007wireless}, or monitor active volcanoes \cite{werner2006deploying}. In these cases, anomaly detection is not only used to limit the required communication, but also to fulfill the core purpose of the network. Not all approaches to anomaly detection in WSN are able to run directly on the node, therefore this survey will differentiate between \emph{decentralized} (algorithms running directly on the node) and \emph{centralized} (running at a central location) methods. It's not always beneficial to have a decentralized approach, as some networks are less restricted by their energy (for example by having a power supply or being frequently serviced by personnel) and would rather use greater computational power and a complete set of data (meaning data from all sensors, not just ones in a local area) to improve their detection and/or prediction accuracy. This is often encountered in industrial settings \cite{ramotsoela2018}. Another factor for these models is the network topology. In a non-static WSN, a model using neighborhood information has to account for changes in the network topology surrounding it, as the number of neighbors changes, or the data they measured previously does not actually belong to the current neighborhood. If the node keeps track of previous measurements, it also needs to take into account how its changes in position might influence the measured data. \subsection{Problem definition} An anomaly is a collection of one or more temporally correlated measurements in a given data set that seem to be inconsistent with expected results. These measurements can originate from different sensors, and in the context of WSN even from different nodes. Bosman et al. \cite{bosman2013} and others distinguish between four different kinds of anomalies relevant in WSN (c.f. Figure~\ref{fig:noisetypes}): \begin{itemize} \item \emph{Spikes} are short changes with a large amplitude \item \emph{Noise} is (an increase of) variance over a given time \item \emph{Constant} is the sudden absence of noise \item \emph{Drift} is an offset which increases over time \end{itemize} A fifth anomaly type, \emph{sensor failure}, is commonly added to anomaly detection \cite{rajasegarar2008,chandola2009}. Since sensor failure often manifests itself in these four different ways mentioned above, and we are not interested in sensor fault prediction, detection and management here, faulty sensors will not be discussed further. Detecting constant type anomalies isn't very difficult, as they can simply be classified as areas of data for which the second (numerical) derivative is zero, and don't need complex models to identify. Therefore they won't be covered any further in this survey. Instead, we will focus on two separate problems: Self-calibration of sensors, and detecting outliers, as both of these are closely connected to anomaly detection in general. A Noise anomaly is not the same as a noisy sensor. A noisy sensor always produces noisy data, while a noise anomaly is the sudden increase in noise in a signal. Working with noisy data is a problem in WSN, but we will not focus on methods of cleaning noisy data, as it is not in the scope of this survey. Elnahrawy et al. \cite{elnahrawy2003} and Barcelo et al. \cite{barcelo2019} are great places to start a survey in this direction. In the general field of anomaly detection, more advanced definitions of anomalies can include patterns (c.f. Figure~\ref{fig:patternanomaly}) and other contextual phenomena, but these are much more rare in WSN, due to the nature of such networks measuring mostly less complex data, such as vibrations, temperature, etc. Therefore most approaches discussed in this survey won't take these anomalies into account and instead focus on the ones discussed above. \begin{figure} \includegraphics[width=8.5cm]{img/anomaly_types.png} \caption{Spike, noise, constant and drift type anomalies in noisy linear data, image from Bosmal et al. \cite{bosman2013}.} \label{fig:noisetypes} \end{figure} \begin{figure} \includegraphics[width=8.5cm]{img/pattern-anomaly.png} \caption{Pattern based anomaly corresponding to an Atrial Premature Contraction in an electrodiagram, image from Chandola et al. \cite{chandola2009}.} \label{fig:patternanomaly} \end{figure} The term outlier and anomaly are often used interchangeably, but often refer to slightly different phenomena. While an anomaly falls into one of these four categories, only spikes, noise, and some types of drifts are considered \emph{outliers} \cite{chandola2009}, as they are the only ones that produce data outside of the considered ''norm''. \subsubsection{Self-calibration} The main problem of self-calibrating WSN is obtaining ground-truth data for each node in the network. This data is required to calculate the offset between the nodes measurements and the ground truth in order to calibrate it. Since it is often infeasible to visit every node in a network, methods need to be formulated to approximate ground truth data from either non-calibrated sensors, or a calibrated sensor located some distance away. \subsubsection{Outlier detection} The problem of outlier detection in WSN is the creation of a model which can use past data to either predict or classify measurements. If the model is able to predict future measurements, outliers are simply detected by their deviation from the predicted value, while models that can classify data can simply classify them as anomalous. Another aspect of outlier detection in WSN is the fact that each node does not possess perfect knowledge about all measurements. A method can collect information inside a neighborhood, which might increases its detection accuracy, but this will also incur a considerable cost in the form of power consumption. \subsection{Structure} After the introduction and coverage of related work, we will look into sensor self-calibration, a method of improving sensor accuracy. Calibrating a sensor will remove constant offsets, enabling nodes to compare measurements between one another more easily. If a sensor is in use for a prolonged length of time, it might need recalibration, in order to remove sensor drift. Afterwards we will look into a collection of different outlier detection methods, ranging from statistical methods to machine learning. \section{Related Work} Chandola et al. \cite{chandola2009} provide a very comprehensive survey on anomaly detection in general, not just focused on WSN. They introduce many key concepts and definitions, but focus more on outliers than anomalies in general. O'Reilly et al. \cite{oreilly2014} look into anomaly detection in WSN in the specific context of non-stationary environments, meaning environments where the ''normal'' state evolves over time, and isn't static. Due to the nature of the problem, almost all approaches presented there had some machine-learning aspects to them, as they needed to first detect when a change of model was required, and then create a new model that conforms to the new data sensed by the network. McDonald et al. \cite{mcdonald2013} survey methods of finding outliers in WSN, with a focus on distributed solutions. They go into a moderate amount of detail on most solutions, but skip over a lot of methods such as principal component analysis (see chapter \ref{cap:pca}), and support vector machines (see chapter \ref{cap:svm}), which were already maturing at that point in time. Instead, they only present distance and density based approaches. Barcelo-Ordinas et al. \cite{barcelo2019} provide a very in-depth reference study for sensor self-calibration, they analyze 39 different approaches in several different categories. This survey is covered further in the section covering sensor self-calibration. Ramotsoela et al. \cite{ramotsoela2018} survey anomaly detection in industrial settings, where machine learning is preferred due to the observed phenomena being more complex. The survey covers both intrusion detection and outlier detection methods, and compiles a table of 17 different approaches to anomaly detection. They look at six fundamentally different approaches and score them based on accuracy, prior knowledge, complexity and data prediction. They look more closely at k-nearest neighbor models but find similar problems as mentioned in chapter \ref{sec:distance} and \ref{sec:density}. Further information concerning advanced machine learning models such as Deep Learning techniques are covered by Chalapathy et al. \cite{chalapathy2019} and Kakanakova et al. \cite{kakanakova2017}. Both of these surveys do not focus on WNS, but propose methods which are applicable to the general field. \section{Sensor Drift and Self-Calibration} Advancements in energy storage density, processing power and sensor availability have increased the possible length of deployment of many WSN. This increase in sensor lifetime, together with an increase in node count due to reduced part cost \cite{wang2016}, as well as the introduction of the Internet of Things (IoT) have brought forth new problems in sensor calibration and drift detection \cite{dehkordi2020}. Increasing the amount of collected data and the length of time over which it is collected introduces a need for better quality control of the sensors that data came from. Ni et al. \cite{ni2009} noticed drift as high as 200\% in soil CO$_2$ sensors, while Buonadonna et al. \cite{buonadonna2005} noticed that his light sensors (which were calibrated to the manufacturer's specification) were performing very poorly when measured against laboratory equipment. It is out of these circumstances, that the need arises for better and more frequent sensor calibration. \begin{figure*}[ht] \includegraphics[width=\textwidth]{img/calibration_attributes.png} \caption{Categories of calibration approaches, from Barcelo-Ordinas et al. \cite{barcelo2019}} \label{fig:calcats} \end{figure*} The field of self-calibration in WSN is fairly broad. In order to get an overview over all approaches Barcelo-Ordinas et al. \cite{barcelo2019} categorized each approach by seven different attributes (Figure~\ref{fig:calcats}): \begin{itemize} \item \emph{Area of interest} distinguishes between \emph{micro} (calibrating sensors to minimize error to a single data point), and \emph{macro} (calibrating nodes to minimize error over a given area of nodes). \item \emph{Number of sensors} determines if data from other sensors is used, so called \emph{sensor fusion}, or if is done with just a \emph{single sensor}. \item \emph{Ground truth} specifies, if the calibration is done in relation to a known good sensor \emph{non-blind}, or without one \emph{blind}. If both calibrated and uncalibrated sensors are used, the approach is considered \emph{semi-blind}. \item \emph{Position from reference} is the distance between the calibration target and the point where the reference data is collected. If data from the close neighborhood is used, the approach is considered \emph{collocated}. If instead nodes are calibrated hop-by-hop in an iterative fashion, it is called \emph{multi-hop}. In \emph{model-based} calibration, fixed ground truth sensors are used in combination with a model to predict sensor error. \item \emph{Calibration time} distinguishes between \emph{pre/post-\break deployment calibration}, \emph{periodic} (calibration at given intervals) and \emph{opportunistic} (when nodes in a mobile network come into range of a calibration source). \item \emph{Operation mode} is either \emph{offline} (calibration when the node is not used) and \emph{online} (calibration during normal operation). \item \emph{Processing mode} divides the approaches into \emph{centralized} processing, meaning calibration parameters are calculated by a central node and then distributed over the network, and \emph{decentralized}, where a single node, or collection of nodes collaborate to calculate their calibration parameters. \end{itemize} This survey will focus on self-calibration techniques that can be used during deployment, as they contribute the most to the accuracy of anomaly detection. We will therefore only cover approaches that can be used in a periodic and opportunistic manner. The biggest problem encountered when trying to calibrate a sensor in the field is the existence of ground truth. Therefore we will focus on the aspects of blind vs. non-blind techniques. \subsection{Problems in Blind Self-Calibration Approaches} The central problem in self-calibration is predicting the error of a given sensor. This is done by comparing the sensor output to so called ground truth data. If no ground truth data is available to the node, it has to be approximated. Kumar et al. \cite{kumar2013} proposes a solution that uses no ground-truth sensors and can be used online in a distributed fashion. It uses spatial Kriging (Gaussian interpolation) and Kalman filtering (a linear approximation model accounting for noise, explained in detail in \ref{sec:kalman}) on neighborhood data in order to reduce noise and remove drift. It assumes, that sensor drift over a large number of sensors will cancel out. This solution suffers from accumulative error due to a missing ground truth, as the system has no point of reference or general model to rely on. The uncertainty of the model, and thereby the accumulative error can be reduced by increasing the number of sensors which are used. A common method for gaining more measurements is increasing network density \cite{wang2016}, or switching from a single-sensor approach to sensor fusion. Barcelo-Ordinas et al. \cite{barcelo2018} explores the possibility of adding multiple copies of the same kind of sensor to each node. All of these approaches are shown to reduce the accumulative error inherent in blind self-calibration approaches but cannot completely negate it. This is a problem for networks who are planned to operate over large time span (e.g. multiple years). In those cases, non-blind calibration might be a better suited solution. \subsection{Non-Blind Self-Calibration Techniques} Non-blind, also known as reference-based calibration approaches rely on known-good reference information. This data is often gathered from much more expensive sensors, which often come with restrictions on their use, e.g. local weather stations not reporting continuous data, and not at the exact location of the WSN. One of the easier method is simply calibrating the sensors in a laboratory setting (e.g. \cite{ramanathan2006}). A recently calibrated sensor is used for calibration within a controlled environment pre and/or post deployment as per the manufacturers specifications and the calibration parameters are applied to the collected data. While this improves the accuracy of the measured data, this is of limited usefulness if live readings from the network need to be accurate, or data is compared to neighbors but high sensor drift is expected. An approach by Hasenfratz et al. \cite{hasenfratz2012} can calibrate low-cost gas sensors instantly with a calibrated sensor nearby, enabling calibration in the field without the need of a controlled environment or laboratory setting. They make use of the fact that air measurements are continuous and vary only slightly over short distances. This of course comes with a trade off in accuracy, but they show that the calibration is as good as the manufacturers. An ozone sensor calibrated using this scheme is only off by $\pm 2$ppb (parts per billion) when compared to a high-quality calibrated ozone sensor, despite the manufacturers claimed accuracy of $\pm 20$ppb. While these results are remarkable, it is not always feasible to visit every sensor in a WSN. Maag et al. \cite{maag2017} propose a solution to this problem. They formulate a hybrid solution, where calibrated sensor arrays can be used to calibrate other non-calibrated arrays in a local network of air pollution sensors over multiple hops with minimal accumulative errors. They show 16-60\% lower error rates than other iterative approaches currently in use. \subsection{An Example for Blind Calibration} \label{sec:kalman} Sirisanwannakul et al. \cite{Sirisanwannakul2021} uses a blind centralized approach, where humidity sensors are calibrated using Kalman filtering in combination with a neural network to detect and counteract sensor drift. Kalman filtering consists of two phases, prediction and update. A Kalman filter can, given the previous state of knowledge at step $k-1$ consisting of an estimated system state and uncertainty, calculate a prediction for the next system state and its uncertainty. This is called the prediction phase. Then, a new (possibly skewed) measurement is observed and used to compute a prediction of the actual current state and uncertainty. This is called the update phase. The filter is recursive in nature and can be calculated with limited hardware in real-time, making it useful for many different anomaly detection applications. Kalman filters are based on a linear dynamical system on a discrete time domain. It represents the system state as vectors and matrices of real numbers. In order to use Kalman filters, the observed process must be modeled in a specific structure: \begin{itemize} \item $F_k$, the state transition model for the $k$-th step \item $H_k$, the observation model for the $k$-th step \item $Q_k$, the covariance of the process noise \item $R_k$, the covariance of the observation noise \item Sometimes a control input model $B_k$ \end{itemize} These must predict the true state $x$ and an observation $z$ in the $k$-th step according to: \begin{align*} x_k &= F_kx_{k-1} + B_ku_k + w_k \\ z_k &= H_kx_k+v_k \end{align*} Where $w_k$ and $v_k$ is noise conforming to a zero mean multivariate normal distribution $\mathcal{N}$ with covariance $Q_k$ and $R_k$ respectively ($w_k \sim \mathcal{N}(0,Q_k)$ and $z_k \sim \mathcal{N}(0,R_k) $). The Kalman filter state is represented by two variables $\hat{x}_{k|j}$ and $P_{k|j}$ which are the state estimate and covariance at step $k$ given observations up to and including $j$. When entering step $k$, we can now define the two phases. \textbf{Prediction phase:} \begin{align*} \hat{x}_{k|k-1} &= F_k \hat{x}_{k-1|k-1}+B_ku_k \\ P_{k|k-1} &= F_kP_{k-1|k-1} F_k^\intercal+Q_k \end{align*} Where we predict the next state and calculate our confidence in that prediction. If we are now given our measurement $z_k$, we enter the next phase. \\ \textbf{Update phase:} \begin{align*} \tilde{y}_k &= z_k - H_k\hat{x}_{k|k-1} & \text{Innovation (forecast residual)} \\ S_k &= H_kP_{k|k-1} H_k^\intercal+R_k & \text{Innovation variance} \\ K_k &= P_{k|k-1}H_k^\intercal S_k^{-1} & \text{Optimal Kalman gain} \\ \hat{x}_{k|k} &= \hat{x}_{k|k-1} + K_k\tilde{y}_k & \text{State estimate} \\ P_{k|k} &= (I-K_kH_k)P_{k|k-1} & \text{Covariance estimate} \end{align*} After the update phase, we obtain $\hat{x}_{k|k}$, which is our best approximation of our real state. Sirisanwannakul et al. takes the computed Kalman gain and compares its bias. In normal operation, the gain is biased towards the measurement. If the sensor malfunctions, the bias is towards the prediction. But if the gains bias is between prediction and measurement, the system assumes sensor drift and corrects automatically. Since this approach lacks a ground truth measurement it cannot recalibrate the sensor, but the paper shows that accumulative error can be reduced by more than 50\%. Kalman filters are used in many calibration models, as they provide a relatively simple way to calculate the confidence in a given measurement, and predict measurements in the near future. Furthermore one can easily identify when the model fails due to the calculated confidence. \section{Outlier detection} \label{cap:outlierdet} This chapter will analyze a couple of fundamentally different approaches to outlier detection. The approaches are roughly ordered by age, where newer approaches come last. We will start with basic methods that are used outside of WSN and transition to more specific applications. All approaches covered here are listed in Table~\ref{tbl:comparison} at the end of the survey and analyzed by a couple of key metrics: \begin{itemize} \item \emph{Prior knowledge}: Does an approach require any prior knowledge, for example for constructing models beforehand, or training machine learning models. \item \emph{Centralized/Decentralized}: Is the outlier detection performed on individual nodes, or at a centralized sink. Some methods work both ways, and some work in a clustered approach \item \emph{Required topology}: An approach requiring static topology means it requires the nodes to be stationary. \item \emph{Communication}: How much communication is required by this approach. ''Normal'' means about the same as streaming all data to the sink, ''Prohibitive'' means it that the approach is not usable and requires some optimization. \item \emph{Recalibration}: Does the model need recalibration or updates when the environment changes around it. \end{itemize} \subsection{Statistical Analysis} Classical Statistical analysis is done by creating a statistical model of the expected data and then finding the probability for each recorded data point (similar to Kalman Filters). Improbable data points are then deemed outliers. The problem for many statistical approaches is finding this model of the expected data, as it is not always feasible to create it in advance, when the nature of the phenomena is not well-known, or if the expected data is too complex. It is also not very robust to changes in the environment \cite{mcdonald2013}, requiring frequent updates to the model if the environment changes in ways not forseen by the model. Sheng et al. \cite{sheng2007} propose an approach to global outlier detection, meaning a data point is only regarded as an outlier, if their value differs significantly from all values collected over a given time, not just from local sensors near the measured one. They propose that the base station requests bucketed histograms of each nodes sensors data distribution to reduce the data transmitted. These histograms are polled, combined, and then used to analyze outliers by looking at the maximum distance a data point can be away from its nearest neighbors. This method bears some problems, as it fails to account for non Gaussian distribution. Another problem is the use of fixed parameters for outlier detection, requiring prior knowledge of the data collected and anomaly density. These fixed parameters also require an update, whenever these parameters change. Due to the histograms used, this method cannot be used in a shifting network topology. Böhm et al. \cite{böhm2008} propose a solution not only to non Gaussian distributions, but also to noisy data. They define a general probability distribution function (PDF) with an exponential distribution function (EDF) as a basis, which is better suited to fitting around non Gaussian data as seen in Figure~\ref{fig:probdistböhm}. They then outline an algorithm where the data is split into clusters, for each cluster an EDF is fitted and outliers are discarded. This method does not require any prior parametrization and is therefore more robust to configuration error. Since this process not only detects outliers, but does a complete clustering of the given data, it is computationally much more expensive than other methods for detecting outliers. However, since this is a complete clustering algorithm, it can be used in offline analysis for clustering and will produce good results quicker than PCA or similar algorithms. Outlier detection is more a byproduct of clustering, than the end result. \begin{figure} \includegraphics[width=8.5cm]{img/probability-dist-böhm.png} \caption{Difference of fitting a Gaussian PDF and a customized exponential PDF. Image from \cite{böhm2008}.} \label{fig:probdistböhm} \end{figure} \subsection{Distance Based Analysis} \label{sec:distance} An older solution to finding outliers in data is the distance based approach, it assigns an anomaly score to each data point, based on the distance to its $k$ nearest neighbors \cite{zhang2006detecting}. This approach however fails at detecting outliers in a system with two or more clusters that do not have the same density. Figure \ref{fig:densityproblem} shows two clusters $C_1$, $C_2$ with varying density. The point $p_1$ will either be incorrectly identified as a non-outlier, or the whole set of $C_1$ will be identified as outliers together with $p_1$. \begin{figure} \includegraphics[width=6.5cm]{img/density-problem.png} \caption{Two sets of clusters $C_1$ and $C_2$, and two outliers $p_1$ and $p_2$. Image from Chandola et al. \cite{chandola2009}.} \label{fig:densityproblem} \end{figure} \subsection{Density Based Analysis} \label{sec:density} Outliers can be selected by looking at the density of points as well. If done correctly, the problem described above (Figure~\ref{fig:densityproblem}) can be prevented. Breuning et al. \cite{breuning2000} propose a method of calculating a local outlier factor (LOF) of each point based on the local density of its $n$ nearest neighbors. The problem lies in selecting good values for $n$. If $n$ is too small, clusters of outliers might not be detected, while a large $n$ might mark points as outliers, even if they are in a large cluster of less than $n$ points. This problem is further exasperated when we try to use this in a WSN setting, for example by streaming through the last $k$ points, as cluster size will not stay constant when incoming data is delayed or lost in transit. Papadimitriou et al. \cite{papadimitriou2003} introduces a parameterless approach. They formulate a method using a local correlation integral (LOCI), which does not require parametrization. It uses a multi-granularity deviation factor (MDEF), which is the relative deviation for a point $p$ in a radius $r$. The MDEF is simply the number of nodes in an $r$-neighborhood divided by the sum of all points in the same neighborhood. LOCI provides an automated way to select good parameters for the MDEF and can detect outliers and outlier-clusters with comparable performance to other statistical approaches. They also formulate aLOCI (approximate LOCI), a linear approximation of LOCI, which also gives accurate results while reducing run time. This approach can be used centralized, decentralized or clustered, depending on the scale of the event of interest. aLOCI seems great for even running on the sensor nodes itself, as it has relatively low computational complexity and can adapt to shifting environments. \subsection{Principal Component Analysis} \label{cap:pca} Another way of detecting outliers is by computing the Principal Component Analysis (PCA) of the collected data. This way one can find the variance of the collected data in each axis. If a measured data point is far outside the expected variance ranges, it can be flagged as anomalous. PCA can also be used to reduce the number of dimensions a set of data contains while minimizing the loss of meaningful information. \begin{figure*}[ht] \includegraphics[width=0.8\textwidth]{img/PCA.png} \caption{An example of reducing a 3 dimensional data set to two dimensions using PCA to minimize loss of information. PCA Vectors are marked red.} \label{fig:pca} \end{figure*} The principal components of a point cloud in $\R^n$ are $n$ vectors $p_i$, where $p_i$ defines a line with minimal average square distance to the point cloud while lying orthogonal to all $p_j, j