diff --git a/References.bib b/References.bib index 7e3afcf..f8fc030 100644 --- a/References.bib +++ b/References.bib @@ -249,7 +249,33 @@ year={2018}, publisher={IEEE} } - +@InProceedings{Sirisanwannakul2021, +author="Sirisanwannakul, Kraithep +and Siripool, Nutchanon +and Kongprawechnon, Waree +and Dangsakul, Prachumpong +and Lewlomphaisarl, Udom +and Sartsatit, Seksun +and Duangtanoo, Thanika +and Keerativittayanun, Suthum +and Suhaili, Wida Susanty Haji +and Owada, Yasunori +and Mya, Khin Than +and Ariffin, Sharifah H. S. +and Karnjana, Jessada", +editor="Suhaili, Wida Susanty Haji +and Siau, Nor Zainah +and Omar, Saiful +and Phon-Amuaisuk, Somnuk", +title="Humidity Sensor Drift Detection and Correction Based on a Kalman Filter with an Artificial Neural Network for Commercial Cultivation of Tropical Orchids", +booktitle="Computational Intelligence in Information Systems", +year="2021", +publisher="Springer International Publishing", +address="Cham", +pages="140--149", +abstract="Polymer dielectric-based humidity sensors used in the orchid greenhouse monitoring system usually work improperly after continuously being used in a high humid condition for some time (e.g., after eight months). This problem, called sensor drift, has been broadly observed. This paper proposes a simple data-driven technique based on a Kalman filter with an artificial neural network to detect the drift and correct data. The combination of two proposed measures based on the {\$}{\$}L^1{\$}{\$}L1distance and the cosine similarity is used to determine the sensor's status, which is later used to adjust the Kalman gain accordingly. That is, when the sensor malfunctions, the gain is biased toward the prediction. When the sensor is in the normal status, the gain is biased toward the measurement. When the sensor drift is detected, the gain varies in between the prediction and the measurement. The experimental results show that the proposed method could reduce the accumulated mean absolute deviation by approximately 55.66{\%}.", +isbn="978-3-030-68133-3" +} @article{mohanty2020, title={Deep learning with LSTM based distributed data mining model for energy efficient wireless sensor networks}, diff --git a/kalman-filter b/kalman-filter index 89c93d1..650879a 100644 --- a/kalman-filter +++ b/kalman-filter @@ -1,45 +1,6 @@ The root problem of drift detection and correction is predicting sensor measurements. This can usually be accomplished in two ways: -This usually requires one or more time series of data and an algorithm which consumes these time series and produces a prediction for the value a sensor should measure next. The most commonly used model for this by far is called \emph{Kalman filtering}, which consists of two phases: +This usually requires one or more time series of data and an algorithm which consumes these time series and produces a prediction for the value a sensor should measure next. -Given the previous state of knowledge at step $k-1$ (estimated system state and uncertainty), we calculate a prediction for the next system state and uncertainty. This is the prediction phase. We then observe a new (possibly skewed) measurement and compute our prediction of the actual current state and uncertainty (update phase). This algorithm is recursive in nature and can be calculated with limited hardware in real-time. - -Kalman filters are based on a linear dynamical system on a discrete time domain. It represents the system state as vectors and matrices of real numbers. In order to use Kalman filters, the observed process must be modeled in a specific structure: - -\begin{itemize} - \item $F_k$, the state transition model for the $k$-th step - \item $H_k$, the observation model for the $k$-th step - \item $Q_k$, the covariance of the process noise - \item $R_k$, the covariance of the observation noise - \item Sometimes a control input model $B_k$ -\end{itemize} - -These models must predict the true state $x$ and an observation $z$ in the $k$-th step according to: - -\begin{align*} - x_k &= F_kx_{k-1} + B_ku_k + w_k \\ - z_k &= H_kx_k+v_k -\end{align*} - -Where $w_k$ and $v_k$ is noise conforming to a zero mean multivariate normal distribution $\mathcal{N}$ with covariance $Q_k$ and $R_k$ respectively ($w_k \sim \mathcal{N}(0,Q_k)$ and $z_k \sim \mathcal{N}(0,R_k) $). - -The Kalman filter state is represented by two variables $\hat{x}_{k|j}$ and $P_{k|j}$ which are the state estimate and covariance at step $k$ given observations up to and including $j$. - -When entering step $k$, we can now define the two phases. \textbf{Prediction phase:} -\begin{align*} - \hat{x}_{k|k-1} &= F_k \hat{x}_{k-1|k-1}+B_ku_k \\ - P_{k|k-1} &= F_kP_{k-1|k-1} F_k^\intercal+Q_k -\end{align*} -Where we predict the next state and calculate our confidence in that prediction. If we are now given our measurement $z_k$, we enter the next phase. \textbf{Update phase:} - -\begin{align*} - \tilde{y}_k &= z_k - H_k\hat{x}_{k|k-1} & \text{Innovation (forecast residual)} \\ - S_k &= H_kP_{k|k-1} H_k^\intercal+R_k & \text{Innovation variance} \\ - K_k &= P_{k|k-1}H_k^\intercal S_k^{-1} & \text{Optimal Kalman gain} \\ - \hat{x}_{k|k} &= \hat{x}_{k|k-1} + K_k\tilde{y}_k & \text{State estimate} \\ - P_{k|k} &= (I-K_kH_k)P_{k|k-1} & \text{Covariance estimate} -\end{align*} - -After the update phase, we obtain $\hat{x}_{k|k}$, which is our best approximation of our real state. diff --git a/paper.tex b/paper.tex index bc5ccf4..a272ee4 100644 --- a/paper.tex +++ b/paper.tex @@ -10,6 +10,7 @@ % use nice mathematical symbols \usepackage{amsfonts} +\usepackage{adjustbox} \newcommand{\R}{\mathbb{R}} \newcommand{\Oc}{\mathcal{O}} @@ -29,11 +30,11 @@ \affiliation{\institution{Universität Augsburg}} \begin{abstract} - Anomaly detection is an important problem in data science, which is encountered every time when data is collected. Since this is done in so many different environments, many different research contexts and application domains in which anomaly detection was researched exist. Anomaly detection in wireless sensor networks (WSN) is a relatively new addition to anomaly detection in general, and this survey will focus on that context in particular. + Anomaly detection is an important problem in data science, which is encountered often when data is collected and analyzed. An anomaly is often defined as a measurement that is inconsistent with the expected results. Since anomaly detection can be applied to many different environments, a multitude of different research contexts and application domains exist in which anomaly detection is researched. Anomaly detection in wireless sensor networks (WSN) is a relatively new addition to the field. - The context of WSN introduces a lot of interesting new challenges, as nodes are often small devices running on battery power and cannot be do much computation on their own. Furthermore, in WSNs communication is often not perfect and messages can and will get lost during operation. Any protocols that incur additional communication must have a good justification, as communication is expensive. All these factors create a unique environment, in which not many existing solutions to the problem are applicable. + The context of WSN introduces a lot of interesting new challenges, as nodes are often small devices running on battery power and cannot do complex computation on their own. Furthermore, in WSNs communication is often not perfect and messages get lost during operation. Any protocols that incur additional communication must have a good justification, as communication is expensive. All these factors create a unique environment, in which not many existing solutions to the problem are applicable. - In this paper, we will not discuss anomaly detection in hostile environments, or intrusion detection, but rather focus solely on anomaly detection in sensor data collected by the WSN. + This paper will focus solely on anomaly detection in sensor data collected by the WSN. \end{abstract} \keywords{Wireless Sensor Networks, Anomaly detection, Outlier detection, Sensor calibration, Drift detection} @@ -42,10 +43,10 @@ \section{Overview} -There are many different approaches to anomaly detection, we will differentiate between centralized and decentralized approaches. An approach is considered centralized, when a large chunk of the computation is done at a single point, or at a later stage during analysis. A decentralized approach implies that a considerable amount of processing is done on the individual nodes, doing analysis on the fly. When analysis is done centralized, it is important to differentiate between online and offline detection. Online detection can run while the WSN is operating, while offline detection is done after the data is collected. Online detection often reduces mission duration due to increased power consumption, but can have the opposite effect, if it can be used to eliminate a large amount of communication. +There are many different approaches to anomaly detection, a common way to classify these is by their place of computation. An approach is considered centralized, when a large chunk of the computation is done at a single point, or at a later stage during analysis. A decentralized approach implies that a considerable amount of processing is done on the individual nodes, doing analysis while being deployed. It is also important to differentiate between online and offline detection. Online detection can run while the WSN is operating, while offline detection is done after the data is collected or during pauses of operation. Online detection often reduces mission duration due to increased power consumption, but can also have the opposite effect, if the analysis done can be used to reduce the amount of communication required for the WSN to function. \subsection{Anomaly types} -Furthermore we need to clarify the different kinds of anomalies that can occur in WSN data sets. Bosman et al. \cite{bosman2017} proposes four different kinds of anomalies that occur in WSN (see also figure \ref{fig:noisetypes}): +We need to clarify the different kinds of anomalies that can occur in WSN data sets. Bosman et al. \cite{bosman2017} proposes four different kinds of anomalies that occur in WSN (c.f. Figure~\ref{fig:noisetypes}): \begin{itemize} \item \emph{Spikes} are short changes with a large amplitude @@ -60,12 +61,14 @@ Furthermore we need to clarify the different kinds of anomalies that can occur i \label{fig:noisetypes} \end{figure} -We will look into sensor self-calibration, which often removes or reduces drift and constant offsets, followed by outlier detection to detect spikes, noise and drift type anomalies. A Noise anomaly is not the same as a noisy sensor, working with noisy data is a problem in WSN, but we will not focus on methods of cleaning noisy data, as it is not in the scope of this survey. Elnahrawy et al. \cite{elnahrawy2003} and Barcelo et al. \cite{barcelo2019} are a great places to start, if you are interested in this topic. +We will first look into sensor self-calibration, which often removes or reduces drift and constant offsets. Then we will look into model based techniques for outlier detection, and then into machine learning based approaches. Outlier detection is able to detect spikes, noise and drift type anomalies, while it has difficulties detecting constant type anomalies. + +A Noise anomaly is not the same as a noisy sensor, working with noisy data is a problem in WSN, but we will not focus on methods of cleaning noisy data, as it is not in the scope of this survey. Elnahrawy et al. \cite{elnahrawy2003} and Barcelo et al. \cite{barcelo2019} are a great places to start, if you are interested in this topic. A fifth anomaly type, \emph{sensor failure}, is commonly added to anomaly detection \cite{rajasegarar2008,chandola2009}. Since sensor failure often manifests in these four different ways mentioned above, and we are not interested in sensor fault prediction, detection and management here, faulty sensors will not be discussed further. \section{Sensor drift and self-calibration} -Advancements in energy storage density, processing power and sensor availability have increased the possible mission time of many WSN. This increase in mission time, together with an increase in node count due to reduced part cost \cite{wang2016}, as well as the introduction of the Internet of Things (IoT) have brought forth new problems in sensor calibration and drift detection \cite{dehkordi2020}. Increasing the amount of collected data and the length of time over which it is collected introduces a need for better quality control of the sensors that data came from. Ni et al. \cite{ni2009} noticed drift as high as 200\% in soil CO$_2$ sensors, while Buonadonna et al. \cite{buonadonna2005} noticed that his light sensors (which were calibrated to the manufacturer's specification) were performing very poorly when measured against laboratory equipment. It is out of these circumstances, that the need arises for better and more frequent sensor calibration. +Advancements in energy storage density, processing power and sensor availability have increased the possible length of deployment of many WSN. This increase in sensor lifetime, together with an increase in node count due to reduced part cost \cite{wang2016}, as well as the introduction of the Internet of Things (IoT) have brought forth new problems in sensor calibration and drift detection \cite{dehkordi2020}. Increasing the amount of collected data and the length of time over which it is collected introduces a need for better quality control of the sensors that data came from. Ni et al. \cite{ni2009} noticed drift as high as 200\% in soil CO$_2$ sensors, while Buonadonna et al. \cite{buonadonna2005} noticed that his light sensors (which were calibrated to the manufacturer's specification) were performing very poorly when measured against laboratory equipment. It is out of these circumstances, that the need arises for better and more frequent sensor calibration. \begin{figure*}[ht] \includegraphics[width=\textwidth]{img/calibration_attributes.png} @@ -74,7 +77,7 @@ Advancements in energy storage density, processing power and sensor availability \end{figure*} -The field of self-calibration in WSN quite broad, in order to get an overview over all approaches Barcelo-Ordinas et al. \cite{barcelo2019} categorized each approach by seven different attributes (Figure \ref{fig:calcats}): +The field of self-calibration in WSN quite broad, in order to get an overview over all approaches Barcelo-Ordinas et al. \cite{barcelo2019} categorized each approach by seven different attributes (Figure~\ref{fig:calcats}): \begin{itemize} \item \emph{Area of interest} distinguishes between \emph{micro} (calibrating sensors to minimize error to a single data point), and \emph{macro} (calibrating nodes to minimize error over a given area of nodes). \item \emph{Number of sensors} determines if data from other sensors is used, so called \emph{sensor fusion}, or if is done with just a \emph{single sensor}. @@ -90,25 +93,66 @@ This level of specialization requires it's own survey, which most recently was B \subsection{Problems in blind self-calibration approaches} The central problem in self-calibration is predicting the error of a given sensor. Since this is such a broad problem, many different solutions exist. -Kumar et al. \cite{kumar2013} proposes a solution that uses no ground-truth sensors and can be used online in a distributed fashion. It uses spatial Kriging (gaussian interpolation) and Kalman filtering (a linear approximation model accounting for noise) on neighborhood data in order to reduce noise and remove drift. This solution suffers from accumulative error due to a missing ground truth, as the system has no point of reference or general model to rely on. The uncertainty of the model, and thereby the accumulative error can be reduced by increasing the number of sensors which are used. A common method for gaining more measurements is increasing network density \cite{wang2016}, or switching from a single-sensor approach to sensor fusion. barcelo-Ordinas et al. \cite{barcelo2018} explores the possibility of adding multiple copies of the same kind of sensor to each node. +Kumar et al. \cite{kumar2013} proposes a solution that uses no ground-truth sensors and can be used online in a distributed fashion. It uses spatial Kriging (gaussian interpolation) and Kalman filtering (a linear approximation model accounting for noise, explained in detail in \ref{sec:kalman}) on neighborhood data in order to reduce noise and remove drift. This solution suffers from accumulative error due to a missing ground truth, as the system has no point of reference or general model to rely on. The uncertainty of the model, and thereby the accumulative error can be reduced by increasing the number of sensors which are used. A common method for gaining more measurements is increasing network density \cite{wang2016}, or switching from a single-sensor approach to sensor fusion. Barcelo-Ordinas et al. \cite{barcelo2018} explores the possibility of adding multiple copies of the same kind of sensor to each node. \subsection{Non-blind self-calibration techniques} Non-blind, also known as reference-based calibration approached rely on known-good reference information. They often rely on data from much more expensive sensors, which often come with restrictions on their use. One type of non-blind calibration is done in a laboratory setting (see\cite{ramanathan2006}), a known-good sensor is used with in a controllable environment. Other approaches can calibrate instantly with a calibrated sensor nearby \cite{hasenfratz2012}, enabling calibration of multiple nodes in quick succession. Maag et al. \cite{maag2017} proposes a hybrid solution, where calibrated sensor arrays can be used to calibrate other non-calibrated arrays in a local network of air pollution sensors over multiple hops with minimal accumulative errors. They show 16-60\% lower error rates than other approaches currently in use. -\subsection{Other methods for drift-correction} +\subsection{An example for blind calibration} \label{sec:kalman} +Sirisanwannakul et al \cite{Sirisanwannakul2021} uses a blind centralized approach, where humidity sensors are calibrated using Kalman filtering in combination with a neural network to detect and counteract sensor drift. Kalman filtering consists of two phases, prediction and update. A Kalman filter can, given the previous state of knowledge at step $k-1$ consisting of an estimated system state and uncertainty, calculate a prediction for the next system state and it's uncertainty. This is called the prediction phase. Then, a new (possibly skewed) measurement is observed and used to compute a prediction of the actual current state and uncertainty. This is called the update phase. The filter is recursive in nature and can be calculated with limited hardware in real-time, making it useful for many different anomaly detection applications. + +Kalman filters are based on a linear dynamical system on a discrete time domain. It represents the system state as vectors and matrices of real numbers. In order to use Kalman filters, the observed process must be modeled in a specific structure: + +\begin{itemize} + \item $F_k$, the state transition model for the $k$-th step + \item $H_k$, the observation model for the $k$-th step + \item $Q_k$, the covariance of the process noise + \item $R_k$, the covariance of the observation noise + \item Sometimes a control input model $B_k$ +\end{itemize} + +These models must predict the true state $x$ and an observation $z$ in the $k$-th step according to: + +\begin{align*} + x_k &= F_kx_{k-1} + B_ku_k + w_k \\ + z_k &= H_kx_k+v_k +\end{align*} + +Where $w_k$ and $v_k$ is noise conforming to a zero mean multivariate normal distribution $\mathcal{N}$ with covariance $Q_k$ and $R_k$ respectively ($w_k \sim \mathcal{N}(0,Q_k)$ and $z_k \sim \mathcal{N}(0,R_k) $). + +The Kalman filter state is represented by two variables $\hat{x}_{k|j}$ and $P_{k|j}$ which are the state estimate and covariance at step $k$ given observations up to and including $j$. + +When entering step $k$, we can now define the two phases. \textbf{Prediction phase:} +\begin{align*} + \hat{x}_{k|k-1} &= F_k \hat{x}_{k-1|k-1}+B_ku_k \\ + P_{k|k-1} &= F_kP_{k-1|k-1} F_k^\intercal+Q_k +\end{align*} +Where we predict the next state and calculate our confidence in that prediction. If we are now given our measurement $z_k$, we enter the next phase. \\ \textbf{Update phase:} + +\begin{align*} + \tilde{y}_k &= z_k - H_k\hat{x}_{k|k-1} & \text{Innovation (forecast residual)} \\ + S_k &= H_kP_{k|k-1} H_k^\intercal+R_k & \text{Innovation variance} \\ + K_k &= P_{k|k-1}H_k^\intercal S_k^{-1} & \text{Optimal Kalman gain} \\ + \hat{x}_{k|k} &= \hat{x}_{k|k-1} + K_k\tilde{y}_k & \text{State estimate} \\ + P_{k|k} &= (I-K_kH_k)P_{k|k-1} & \text{Covariance estimate} +\end{align*} + +After the update phase, we obtain $\hat{x}_{k|k}$, which is our best approximation of our real state. + +Sirisanwannakul et al. takes the computed Kalman gain and compares its bias. In normal operation, the gain is biased towards the measurement. If the sensor malfunctions, the bias is towards the prediction. But if the gains bias is between prediction and measurement, the system assumes sensor drift and corrects automatically. Since this approach lacks a ground truth measurement it cannot recalibrate the sensor, but the paper shows that accumulative error can be reduced by more than 50\%. \section{Outlier detection - model-based approaches} -When we speak of a centralized WSN, we mean, that there exists a central entity, called the \emph{base station} or \emph{fusion centre}, where all data is delivered to and analyzed. It is often assumed, that the base station does not have limits on its processing power or storage. Centralized approaches are not optimal in hostile environments, but that is not our focus here. Since central anomaly detection is closely related to the general field of anomaly detection, we will not go into much detail on these solution, instead focusing on covering solutions more specific to the field of WSN. +A centralized WSN is defined by the existence of a central entity, called the \emph{base station} or \emph{fusion centre}, where all data is delivered to and analyzed. It is often assumed, that the base station does not have limits on its processing power or storage. Centralized approaches are not optimal in hostile environments, but that is not our focus here. Since central anomaly detection is closely related to the general field of anomaly detection, we will not go into much detail on these solution, instead focusing on covering solutions more specific to the field of WSN. \subsection{Statistical analysis} -Classical Statistical analysis is done by creating a model of the expected data and then finding the probability for each recorded data point. Improbable data points are then deemed outliers. The problem for many statistical approaches is finding this model of the expected data, as it's not always feasible to create it in advance. It also bears the problem of bad models or slow changes in the environment \cite{mcdonald2013}. +Classical Statistical analysis is done by creating a model of the expected data and then finding the probability for each recorded data point. Improbable data points are then deemed outliers. The problem for many statistical approaches is finding this model of the expected data, as it is not always feasible to create it in advance. It also bears the problem of bad models or slow changes in the environment \cite{mcdonald2013}. -Sheng et al. \cite{sheng2007} proposes a new approach, where histograms of each node are polled, combined, and then analyzed for outliers by looking at the maximum distance a data point can be away from his nearest neighbors. This solution has several problems, as it incurs a considerable communication overhead and fails to account for non gaussian distribution. Since the this approach uses fixed parameters, it also requires updating them every time the expected data changes. +Sheng et al. \cite{sheng2007} proposes a new approach, where histograms of each nodes sensors data are polled, combined, and then analyzed for outliers by looking at the maximum distance a data point can be away from his nearest neighbors. This solution has several problems, as it incurs a considerable communication overhead and fails to account for non gaussian distribution. Since the this approach uses fixed parameters, it also requires updating them every time the expected data changes. -Böhm et al. \cite{böhm2008} proposes a solution not only to non gaussian distributions, but also to noisy data. He defines a general probability distribution function (PDF) with an exponential distribution function (EDF) as a basis, which is better suited to fitting around non gaussian data as seen in figure \ref{fig:probdistböhm}. He then outlines an algorithm where the data is split into clusters, for each cluster an EDF is fitted and outliers are discarded. +Böhm et al. \cite{böhm2008} proposes a solution not only to non gaussian distributions, but also to noisy data. They define a general probability distribution function (PDF) with an exponential distribution function (EDF) as a basis, which is better suited to fitting around non gaussian data as seen in Figure~\ref{fig:probdistböhm}. He then outlines an algorithm where the data is split into clusters, for each cluster an EDF is fitted and outliers are discarded. \begin{figure} \includegraphics[width=8.5cm]{img/probability-dist-böhm.png} @@ -117,9 +161,9 @@ Böhm et al. \cite{böhm2008} proposes a solution not only to non gaussian distr \end{figure} \subsection{Density based analysis} -Outliers can be selected by looking at the density of points as well. Breuning et al. \cite{breuning2000} proposes a method of calculating a local outlier factor (LOF) of each point based on the local density of its $n$ nearest neighbors. The problem lies in selecting good values for $n$. If $n$ is too small, clusters of outliers might not be detected, while a large $n$ might mark points as outliers, even if they are in a large cluster of $ as it is +* HE => THEY +* schreiben warum non-blind calibration scheiße ist +* the paper => they +* Chan et al. \cite{chan2012} proposes a solution to this problem, he develops two methods to approximate the eigenvalue decomposition by updating the state recursively and reusing large parts of the already done calculation, which reduces the computational complexity. They simulate this algorithm on existing data sets and find it outperforms existing PCA based solutions such as \cite{li2000, tien2004}. +* GHA fehlt ergebnis +* extreme learning erster absatz ist kaputt +* extreme learning erweitern +* SVM rajesagrar cites are weird +* https://scihubtw.tw/10.1145/3134302.3134337 https://netlibrary.aau.at/obvuklhs/content/titleinfo/5395523/full.pdf ## proposed structure