Robust Extended Kalman Filtering in Hybrid. Positioning Applications. Master of Science Thesis

Tommi Perälä Robust Extended Kalman Filtering in Hybrid Positioning Applications Master of Science Thesis Examiner: Professor Robert Piché (TUT) Examiner and topic approved in the council meeting of the Faculty of Science and Environmental Engineering on April 9, 2008

Preface This Master of Science Thesis was written at the Department of Mathematics at the Tampere University of Technology. Since I have been woring as a research assistant in the Personal Positioning Algorithms Research Group, I also decided to dedicate my thesis to this research area. At this point, I want to than my supervisor Prof. Robert Piché for his valuable advices and for the opportunity to wor in this group. I want to express my gratitude to my colleagues in the Personal Positioning Algorithms Research Group for their help and enlightening discussions. Moreover, I want to than everyone who has supported me during my studies. Special thans go to my parents, brother, grand parents and my girlfriend Juliane. Your support has not gone unnoticed. This study was funded by Noia Corporation. EKF used for comparison was implemented by Simo Ali-Löytty, and the simulations were generated using the Personal Navigation Filter Framewor that is implemented mostly by Niilo Sirola and Matti Raitoharju. Tampere, September 19, 2008 Tommi Perälä Kemianatu 4 A 20 33720 Tampere, Finland Tel. 044-5009743 i

Abstract TAMPERE UNIVERSITY OF TECHNOLOGY Faculty of Science and Environmental Engineering, Department of Mathematics Perälä, Tommi: Robust Extended Kalman Filtering in Hybrid Positioning Applications Master of Science Thesis, 60 pages and 10 Appendix pages Examiner: Professor Robert Piché Keywords: Positioning, Robust Filtering, Kalman Filter, Extended Kalman Filter Location-based services require accurate information about the position of the user in order to function properly. In outdoor environments, positioning devices that use signals from the Global Navigation Satellite Systems (GNSS) can usually infer the user s location with sufficient accuracy. In addition, other signal sources, e.g., the base stations of terrestrial networs, might be used to improve the position estimate. A non-linear extension of the Kalman Filter, namely, the Extended Kalman Filter (EKF) has been studied in hybrid positioning application, and because of its light computational demand and relatively high accuracy, it seems to be a feasible algorithm for today s mobile positioning devices. However, in dense urban areas, buildings or some other obstacles may affect the signal propagation by reflecting or partly absorbing the signals before they reach the receiver. These disturbances in the signal environment produce so-called blunder measurements and their effect on the position estimation is, in general, extremely difficult to model. Although EKF performs well in open areas, its performance degrades drastically in dense urban areas because of the above mentioned disturbances. Therefore, a filter that better tolerates the blunder measurements needs to be developed. ii

ABSTRACT iii In this wor, the hybrid positioning problem is formulated as a robust non-linear filtering problem and two robust filters, namely, the Approximate Bayesian Extended Kalman Filter and the Re-weighted Extended Kalman Filter are derived based on EKF. The combination of the two methods, called the Hybrid Extended Kalman Filter is also proposed. The filter algorithms were implemented in Matlab and tested in simulations and using real GPS measurements. Simulations with satellite pseudorange and deltarange measurement, and base station range and altitude measurements show that the developed algorithms usually perform better compared to EKF when blunder measurements occur. Tests using real GPS measurement data validate the simulation results.

Tiivistelmä TAMPEREEN TEKNILLINEN YLIOPISTO Luonnontieteiden ja ympäristöteniian tiedeunta, Matematiian laitos Perälä, Tommi: Viasietoinen laajennettu Kalmanin suodatin hybridipaiannussovellusissa Diplomityö, 60 sivua ja 10 liitesivua Tarastaja: Professori Robert Piché Hausanat: Positioning, Robust Filtering, Kalman Filter, Extended Kalman Filter Paiatietoon perustuvat palvelut tarvitsevat taran tiedon äyttäjän paiasta toimiaseen tyydyttävästi. Paiannuslaitteet, jota päättelevät äyttäjän paian paiannuseen taroitettujen satelliittijärjestelmien tarjoamien signaalien perusteella, toimivat yleensä lähes moitteettomasti ulotiloissa, un näyvyys satelliitteihin on hyvä. Satelliittisignaalien lisäsi myös maanpäällisten tuiasemien signaaleja voidaan hyödyntää paiannusessa. Laajennettu Kalmanin suodatin (EKF) on laajalti äytössä oleva epälineaarinen suodatin, jona soveltuvuutta paiannuseen on myös tutittu. Vaiuttaisi siltä, että EKF soveltuu hyvin äytettäväsi annettavissa paiannuslaitteissa lasennallisen eveytensä ja suhteellisen hyvän taruutensa ansiosta. On uitein huomattu, että tiheästi raennetuissa aupunialueissa raennuset tai muut esteet häiritsevät signaalien etenemistä esimerisi heijastamalla tai heientämällä tuiasemien ja satelliittien lähettämiä signaaleja. Kyseiset ilmiöt näyvät mittausissa virheenä, jota on äytännössä äärimmäisen työlästä ja vaieata mallintaa tai ennustaa. Tällaisia mittausia utsutaan virhemittausisi. Vaia suurimmat virheet voidaanin yleensä arsia helposti pois, saattaa osa virheistä uitenin jäädä huomaamatta. iv

TIIVISTELMÄ v Vaia EKF toimiiin hyvin avoimilla alueilla, sen suoritusyy heienee huomattavasti virhemittausia äsiteltäessä. Suoritusyvyn heieneminen on osoittautunut ongelmasi nyyisissä paiannussovellusissa, joiden toivotaan toimivan saumattomasti aienlaisissa ympäristöissä. Tämä toimii motiivina viasietoisempien paiannusalgoritmien ehittelyyn. Tässä työssä on taroitusena ehittää viasietoisempia suodatusalgoritmeja, joita voitaisiin äyttää tulevissa paiannussovellusissa. Hybridipaiannusongelma on muotoiltu viasietoisesi epälineaarisesi suodatusongelmasi ja asi viasietoista epälineaarista suodatinta, nimeltään Summittainen Bayesilainen laajennettu Kalmanin suodatin (ABEKF) ja Uudelleenpainotettu laajennettu Kalmanin suodatin (REKF), esitetään ongelman rataisuisi. Myös näiden ahden yhdistelmää, Hybridiä laajennettua Kalmanin suodatinta (HEKF), tarastellaan lyhyesti. Tässä työssä esitettyjen suodattimien algoritmit muistuttavat läheisesti EKF:n algoritmia niin raenteeltaan uin lasennallisilta vaatimusiltaan ja soveltuvat siten hyvin annettaviin paiannuslaitteisiin. Suodattimien algoritmit toteutettiin Matlab-ohjelmistoa hyväsi äyttäen ja suodattimien toimintayyä oeteltiin simulaatioita ja oieata GPS-mittausdataa äyttäen. Simulaatioissa äytettiin satelliittien pseudoetäisyys- ja pseudoetäisyyden derivaattamittausia, tuiasemien etäisyysmittausia ja oreusmittausia. Simulaatioissa viasietoiset suodattimet toimivat pääsääntöisesti paremmin uin EKF. REKF ja HEKF toimivat aina paremmin uin EKF, un mittausten seassa oli virhemittausia. ABEKF toimi paremmin uin EKF ainoastaan, un muana oli tuiasemamittausia. ABEKF ei toiminut osaan paremmin uin REKF tai HEKF. Samoihin johtopäätösiin päädyttiin myös oiean GPS-mittausdatan anssa. REKF ja HEKF paransivat paiaestimaatteja EKF:n verrattuna, un taas ABEKF ei toiminut hyvin. Syysi ABEKF:n toimimattomuuteen arveltiin mittalaitteen arvioimien varianssien optimistisuus.

Contents Preface Abstract Tiivistelmä Abbreviations and Acronyms Symbols i ii iv viii ix 1 Introduction 1 2 Preliminaries 4 2.1 Stochastic Processes........................... 4 2.2 The Kalman Filter............................ 7 2.3 The Extended Kalman Filter....................... 9 2.4 M-estimators............................... 11 2.4.1 Minimax Robustness....................... 13 2.4.2 Contamination Classes...................... 15 2.4.3 Computation of the M-estimates................ 18 2.4.4 Other Examples of M-estimators................ 19 3 Robust Kalman Filtering 23 3.1 Approximate Bayesian Kalman Filter.................. 23 3.2 Re-weighted Kalman Filter........................ 28 3.3 Hybrid Kalman Filter........................... 31 4 Applications in Hybrid Positioning 32 4.1 The State Model of the User....................... 32 4.2 The Measurement Model......................... 33 4.2.1 Pseudorange Measurements................... 33 4.2.2 Deltarange Measurements.................... 34 vi

CONTENTS vii 4.2.3 Base Station Range Measurements............... 35 4.2.4 Combined Measurements..................... 36 5 Simulations and Tests 37 5.1 Simulation Setup............................. 37 5.2 Simulation Results............................ 39 5.3 Tests Using Real GPS data....................... 43 5.4 Test Results................................ 43 6 Conclusions 46 A Tables 49 B Lemmas 53 C Algorithms 57

Abbreviations and Acronyms ABEKF ABKF cdf cpdf DHA EKF GLONASS GNSS GPS H HA HEKF KF M MAP MMSE pdf RKF REKF WLAN Approximate Bayesian Extended Kalman Filter Approximate Bayesian Kalman Filter Cumulative distribution function Conditional probability density function Damped Hampel estimator Extended Kalman Filter Global naya Navigatsionnaya Sputniovaya Sistema Global Navigation Satellite Systems Global Positioning System Huber estimator Hampel estimator Hybrid Extended Kalman Filter Kalman Filter p-point estimator Maximum a posteriori Minimum mean-square error Probability density function Re-weighted Kalman Filter Re-weighted Extended Kalman Filter Wireless local area networ viii

Symbols integral over a particular space infinity x 1 vector of ones b cloc bias ḃ cloc drift C 1 C δ x det D ǫ ǫ E E(x y) E(x y = y) E F f x gradient operator with respect to x set of differentiable functions covariance matrix of an estimator an operator that puts mass 1 to point x determinant operator difference mapping amount of contamination in F ǫ measurement error expectation value operator conditional expectation value conditional expectation value conditioned on y = y estimator specific parameter probability density function of x fǫ 0 fp 0 F F ǫ F p G h H I K least favorable density of F ǫ least favorable density of F p σ-algebra or set of densities ǫ-contaminated normal neighborhood p-point family state transition matrix measurement function linear measurement function or the Jacobian of the measurement function identity matrix threshold parameter Kalman gain matrix ix

SYMBOLS x K W Λ L l ln N n s n x n y ω DHA ω Fǫ ω Fp ω HA Ω (Ω, F, P) P p p b φ Φ ψ ψ IF ψ DHA ψ Fǫ ψ Fp ψ HA q Q ρ ρ ρ s ρ s ρ b R R W R r r re-weighted Kalman gain matrix diagonal matrix of eigenvalues lielihood function log-lielihood function natural logarithm function set of natural numbers number of satellites dimension of the state dimension of the measurements weight function of the Damped Hampel M-estimator weight function of the Huber s M-estimator weight function of the p-point M-estimator weight function of the Hampel s three parts redescending M-estimator sample space probability space covariance matrix of the state x parameter for F p blunder probability standard normal probability density function standard normal cumulative distribution function ψ-function of an M-estimator influence function influence function of the Damped Hampel M-estimator influence function of the Huber s M-estimator influence function of the p-point M-estimator influence function of the Hampel s three parts redescending M-estimator eigenvector state model noise covariance matrix or matrix of eigenvectors ρ-function of an M-estimator rejection point of an M-estimator vector of pseudorange measurements vector of deltarange measurements vector of base station range measurements measurement noise covariance matrix re-weighted measurement noise covariance matrix set of real numbers transformed innovation parameter for ψ DHA or satellite/base station position

SYMBOLS xi r u sign Σ > 0 σ a σ p σ 2 s user s position signum function matrix Σ is positive definite velocity error in the vertical direction velocity error in East-North plane variance stochastic variable describing the innovation s m θ ˆθ T U b U s u V s V V(x y) V(x y = y) v v u v s W w x x x + parameter for ψ Fp parameter estimator transformation matrix matrix of unit vectors pointing from the user to base stations matrix of unit vectors pointing from the user to satellites unit vector pointing from the user to a satellite matrix of satellite velocities covariance matrix operator conditional covariance matrix conditional covariance matrix conditioned on y = y stochastic process of measurement noise velocity of the user velocity of a satellite weight matrix stochastic process of state update noise stochastic variable describing the state stochastic variable describing the prior state stochastic variable describing the posterior state x N p (µ, Σ) x has a p-dimensional normal probability density function with mean µ and covariance matrix Σ y stochastic variable describing the measurements y y p stochastic variable describing the predicted measurements threshold parameter for p-point estimator Bold font (x, ˆθ,...) refers to a stochastic variable or an estimator. Normal font (x, ˆθ,...) refers to a realization of a stochastic variable, estimate or a parameter.

SYMBOLS xii Capital letters (ˆP, H,...) are used to denote matrices, operators or functions. A hat above a symbol (ˆθ, ˆx,...) denotes an estimator or an estimate, and a chec ȟ denotes a measurement equation without the bias term. Subscripts : at timestep t 0: initial condition or state or minimax robust estimator 1:: y 1: = {y i, i = 1,..., } n m: matrix has dimension n m Superscripts 1 : matrix inverse or reciprocal T : transpose : prior state + : posterior state 0 : function relating to the least favorable density W : re-weighted matrix

Chapter 1 Introduction Hybrid positioning refers to the estimation of one s location by combining many different sources of information. This information is usually obtained in the form of measurements which may be, for example, pseudorange or deltarange measurements from satellites. In addition, various wireless networs on Earth, for example, cellular networs, WLAN or Bluetooth provide means for positioning in the form of range measurements, received signal strength indicators and sector information. Portable positioning devices may also contain inertial measurement units that provide information about the movements of the user. The Global Positioning System (GPS) is currently the only fully functional Global Navigation Satellite System (GNSS) designed mostly for positioning purposes. In the near future the European Galileo and the Russian GLONASS satellite systems will also be available for positioning. In open areas these satellite systems provide a sufficient number of accurate measurements that can be used for calculating a reliable estimate of the position of the user. However, the accuracy drastically decreases in dense urban areas where the view to the satellites might be bloced, for example, by buildings. Due to these obstacles it might be that there are not enough satellites visible for determining a unique position solution. The signals might also be reflected from certain surfaces, lie walls and windows, before they reach the receiver and thus give false information about the position of the user. The same problem arises with all positioning methods that use radio signals, especially in indoor environments. The positioning problem may be formulated as an inverse problem. Measurements related to the state are obtained, and the relation between the measurements and the state is assumed to be nown. The problem is then to solve the state using the measurements. The difficulty of the problem is, however, that the relation 1

CHAPTER 1. INTRODUCTION 2 between the state and the measurements is usually non-linear, and, in general, it is not possible to formulate the solution of the state analytically. Moreover, the exact solution might not be well-defined, since the state might be multidimensional and the number of available measurements inadequate for a unique solution to exist. One way to solve the inverse problem is to formulate it as a filtering problem. Bayesian filtering offers a suitable framewor for dynamic realtime positioning applications. The aim of Bayesian filtering is to solve the conditional probability density function (cpdf) of the state conditioned on all measurements. In hybrid positioning applications it is rarely possible to solve the cpdf analytically. Therefore, many approximate solutions for this problem have been invented. Algorithms for real time hybrid positioning must have low computational complexity in order to be used in today s mobile devices. Although Bayesian filtering provides a theoretically complete framewor for optimal non-linear filtering, it requires the evaluation of multi-dimensional integrals, which in general have to be computed using approximate numerical techniques such as Monte Carlo. High accuracy can often only be achieved with a huge amount of computation time and memory. However, in the linear-gaussian case the integrals can be solved analytically and the computation consists of only a relatively small amount of matrix multiplications, sums and inversions. The resulting filter, namely, the Kalman Filter (KF) [1], has low memory requirements because, due to the Gaussianess of the densities, only the first two moments of the densities need to be stored. Kalman filtering has been the subject of extensive study in hybrid positioning applications in recent years. Simulations show that a non-linear extension of KF, namely, the Extended Kalman Filter (EKF) might be used in hybrid positioning applications in today s small mobile devices [2]. However, real life situations have shown that the assumptions of KF are too strict and even a slight deviation from the assumptions degrades the performance of the filter drastically. Therefore, algorithms that wor nearly as well as KF when the assumptions of KF are valid, and that manage better under unfavorable conditions must be developed. For this purpose, the theory of robust estimation is applied to KF. This thesis is organized as follows. Chapter 2 starts with an introduction to the filtering theory. Next, KF and EKF are discussed in more detail, and a class of estimators, namely, the M-estimators introduced by P. J. Huber [3] are presented.

CHAPTER 1. INTRODUCTION 3 In Chapter 3, two robust alternatives for KF are presented. The first filter is based on the ideas of Martin and Mazrelies [4, 5]. For the derivation of the second filter, it is shown that KF is the solution of a least squares problem. The least squares problem is reformulated using M-estimators, which is then solved with another ind of a robust KF. Chapter 4 discusses the state model of the user and the measurements used in hybrid positioning. The methods proposed in Chapter 3 were implemented in MATLAB and tested in simulations and using real GPS data. Chapter 5 discusses the simulations and tests. In Chapter 6, conclusions are drawn and future research outlined.

Chapter 2 Preliminaries This chapter introduces the basic concepts of stochastic processes and the general filtering problem. The linear Bayesian filter, namely, the Kalman Filter, which is a solution to an important special case, is discussed in Section 2.2. Section 2.3 is devoted to the Extended Kalman Filter a non-linear extension of KF. M-estimators are introduced, and the most robust M-estimators for two classes of densities are given in Section 2.4. 2.1 Stochastic Processes The following definitions may be found, for example, in the references [6] and [7]. It is assumed that basic concepts lie the sample space (denoted by Ω), the probability space (denoted by (Ω, F, P)), the stochastic variable (denoted by a bold letter, for example x), the probability density function (pdf) of a stochastic variable (denoted by f x (x)), independence and the joint pdf of x and y (denoted by f x,y (x, y)) are nown to the reader. These concepts may be found for example in [8]. Definition 2.1 (Stochastic Process). Let (Ω, F, P) be a probability space and Θ a discrete set of parameters. A stochastic process is a mapping x : Ω Θ R n, such that x ( ) is a stochastic variable for every t Θ. For each ω Ω, x (ω) is the realization of the process or a sample sequence. For simplicity the notation x is used for a stochastic process. Definition 2.2 (Conditional Probability Density Function, Bayes Rule). The conditional probability density function of a stochastic variable x conditioned on y = y is defined by f x y (x y) = f x,y(x, y), f y (y) 4

CHAPTER 2. PRELIMINARIES 5 where the denominator is positive. Corollary 2.3 (The Second Form of Bayes Rule). It follows from Definition 2.2 that f x y (x y) = f y x(y x)f x (x) f y (y) = f y x(y x)f x (x) fy x (y x)f x (x)dx. (2.1) The derivation of the above formulation is given for example in [7, p. 81]. Definition 2.4 (Conditional Mean). The conditional mean of a stochastic variable x conditioned on y = y is defined as E(x y = y) = xf x y (x y)dx. Definition 2.5 (Marov Process). A stochastic process x, N, is called Marov process if for all N \ {0} f x x 1,...,x 1 (x x 1,...,x 1 ) = f x x 1 (x x 1 ). Definition 2.6 (White Process). A Marov process x, N, is called white if for all N \ {0} f x x 1 (x x 1 ) = f x (x ). The state of the dymanic process is modeled as a stochastic process x, N. The initial state is denoted by x 0. Generally, the process dymanics are governed by a non-linear differential equation. In order to be implemented in digital devices, the differential equations have to be discretized. The emphasis of this wor is on the measurement model, thus the case where the process dymanics are described by a linear difference equation x +1 = G x + w, (2.2) is considered. Here x is the state at time step t and w is a zero mean, white noise process with covariance matrix Q. The noise w is assumed to be independent of the initial state x 0. The sequence generated by (2.2) is thus also Marov [6, p. 86]. The measurement equation y = h (x ) + v, (2.3) where h C 1, describes the connection between the state and the measurements y. The measurement noise v is assumed to be white and independent of x 0 and

CHAPTER 2. PRELIMINARIES 6 the state noise w. Thus, the joint process [x,y ] T is also a Marov process [6, p. 143]. The state conditioned on all realized measurements is called the posterior state and is denoted by x +. The pdf of the posterior state is denoted by f x + (x y 1: ). The state conditioned on the past realized measurements is called the prior state and is denoted by x, and its pdf is denoted by f x (x y 1: 1 ). It follows from the whiteness of the noise processes and the Bayes rule that the posterior pdf may be formulated as f x + (x y 1: ) = f y x (y x )f x f y (x y 1: 1 ). (2.4) (y y 1: 1 ) The derivation of this form of the Bayes rule may be found for example in [7, p. 210]. The posterior pdf is thus proportional to the product of the prior pdf and the measurement lielihood function f y x (y x ) = f v (y h (x )), which is given by the measurement model. The normalization factor f y (y y 1: 1 ) = f y x (y x )f x (x y 1: 1 )dx 0 (2.5) is necessary to ensure that f x + (x y 1: ) is a pdf. The measurement conditioned on the previous measurements is denoted by y. The normalization factor is also called the predicted measurement density or the innovation density. Estimation based on the posterior density is called Bayesian estimation. Sequential Bayesian estimation is called filtering. A desired estimate of the state x with an estimate of the error may be calculated from the posterior pdf (2.4). For example, the minimum mean-square error (MMSE) estimate is the conditional mean of x [9]. Estimation based on the maximization of the cpdf is called maximum a posteriori (MAP) estimation. Assuming that the densities are symmetric and unimodal, the MAP-estimate is the same as MMSE estimate. The MAP-estimate is the most probable value the state may have and is thus considered as the optimal estimate of the state. Although the Bayesian framewor provides means for optimal non-linear filtering, it requires the evaluation of multi-dimensional integrals which, in general, cannot be solved analytically. Instead, approximate numerical methods such as Monte Carlo have to be used. Unfortunately, numerical methods are computationally too demanding in order to be used in today s mobile positioning devices. However, in

CHAPTER 2. PRELIMINARIES 7 1960, Rudolf E. Kalman introduced the famous Kalman Filter, which solves the cpdf analytically in an important special case. This filter is discussed in the next section. 2.2 The Kalman Filter Definition 2.7 (Weighted Vector 2-norm). Let a R n be a vector and C R n n be a symmetric positive definite matrix. Then, the weighted vector 2-norm is defined as a C = a T Ca Definition 2.8 (Normal Distribution). A stochastic variable x is said to be normally distributed or Gaussian if its probability density function f x (x) is of the form f x (x) = 1 det(2πσ) e 1 2 x µ 2 Σ 1, where µ is the mean and Σ > 0 is the covariance matrix of x. Later on the notation x N p (µ, Σ) will be used for p-dimensional normally distributed stochastic variables. If p = 1 the subscript is left out. If p = 1, µ = 0 and Σ = 1, x has a standard normal pdf. The standard normal pdf is denoted by φ(x) and the standard normal cumulative distribution function (cdf) by Φ(x). KF was first introduced in [1] and solves the cpdf (2.4) when the measurement function (2.3) is linear and the noises and the initial state are normally distributed. Given these assumptions, the posterior pdf (2.4) is also a normal density and it is completely characterized by its first two moments, i.e. the mean and the covariance matrix. Consider the linear state model given in (2.2) and let the state vector be x and the dimension of the state n x 1. The state transition matrix is G R nx nx and the process noise is given by w N nx (0, Q ). Let the initial state be x 0 N nx (x 0, P 0 ), and assume that the covariance matrices Q and P 0 are symmetric positive definite. Let the measurement model in (2.3) be linear and denote the linear measurement function by H R ny nx, where n y 1 is the dimension of the measurement vector at time step t. The linear measurement equation may be written as y = H x + v, (2.6)

CHAPTER 2. PRELIMINARIES 8 where the measurement noise v positive definite. N ny (0, R ) and R is assumed symmetric Next, the functions on the right hand side of (2.4) are derived. The mean of the prior density is according to the state model (2.2) and the covariance matrix ˆx = E(x y 1: 1 ) = E(G 1 x 1 + w 1 y 1: 1 ) = E(G 1 x 1 y 1: 1 ) + E(w 1 ) = G 1 E(x 1 y 1: 1 ) = G 1ˆx 1 ˆP = V(x y 1: 1 ) = V(G 1 x 1 + w 1 y 1: 1 ) = V(G 1 x 1 y 1: 1 ) + V(w 1 y 1: 1 ) = G 1 V(x 1 y 1: 1 )G T 1 + Q 1 = G 1ˆP 1 G T 1 + Q 1. The prior density function may be written as ( ) exp 1 x ˆx 2 2 (ˆP f x (x y 1: 1 ) = ) 1. (2.7) det(2πˆp ) Using (2.6) the mean of the predicted measurement is and the covariance matrix E(y y 1: 1 ) = H E(x y 1: 1 ) + E(v y 1: 1 ) = H ˆx V(y y 1: 1 ) = H V(x y 1: 1 )H T + V(v y 1: 1 ) = H ˆP HT + R Thus, the predicted measurement density may be written as ( exp 1 y H 2 ˆx 2 (H ˆP f y (y y 1: 1 ) = det(2π(h ˆP HT + R )) HT +R ) 1 ). (2.8) Define s = y y 1: 1 E(y y 1: 1 ) and call it the innovation at timestep t. Since E(s ) = 0 and V(s ) = V(y y 1: 1 ), the pdf of the innovation may be written as ( 1 s 2 2 (H ˆP exp f s (s ) = det(2π(h ˆP HT + R )) HT +R ) 1 ).

CHAPTER 2. PRELIMINARIES 9 The measurement lielihood function may be written as ( ) exp 1 y 2 H x 2 R 1 f y x (y x ) = f v (y H x ) =. (2.9) det(2πr ) The measurement lielihood function is not a probability density function of the state, but instead, it tells how liely a certain state x is given the measurements y. Inserting (2.7), (2.8), and (2.9) into (2.4) yields the posterior density ) exp ( 12 x ˆx 2(ˆP) f x + (x y 1: ) = 1, det(2πˆp ) where the posterior mean and the posterior covariance and ˆx = ˆx + K (y H ˆx ), (2.10) ˆP = (I K H )ˆP, K = ˆP HT (H ˆP HT + R ) 1 (2.11) is called the Kalman gain matrix. The proof is fundamental but tedious and is thus omitted here. An interested reader is referred to [7]. 2.3 The Extended Kalman Filter KF is the solution for the linear dynamic system defined in (2.2) and (2.6). The problem in practical applications is that the equations describing the physical system are generally non-linear. Therefore, non-linear extensions of KF have been studied. One of these extensions is the Extended Kalman Filter [6], which will be discussed next. In EKF the non-linear state and measurement equations are linearized using the first order Taylor series expansion. The state update function is linearized at the posterior mean of the previous time step and the measurement function at the prior mean of the current time step. In this wor the state model is assumed to be linear, and thus the linearization is done only for the measurement model.

CHAPTER 2. PRELIMINARIES 10 Consider the non-linear measurement equation (2.3). The first order Taylor series approximation of the measurement function at the prior mean is h (x ) h (ˆx ) + H (x ˆx ), where the Jacobian of the measurement function is H = h (x ) x x =ˆx. Denoting y = h (ˆx ) and inserting into (2.3) gives y = y + H (x ˆx ) + v. Re-arranging and denoting x = x ˆx and y = y y yields y = H x + v. Using the KF relations introduced in the previous section, the mean of x y 1: becomes ˆx = ˆx + K ( y H ˆx ) ˆx ˆx = ˆx + K (y y H ˆx ). Noting that ˆx = ˆx ˆx = 0 and re-arranging gives the posterior mean estimate ˆx = ˆx + K (y h (ˆx )). Since V( x ) = V(x ˆx ) = V(x ), the posterior covariance is given by ˆP = (I K H )ˆP, where K is the Kalman gain introduced in (2.11). EKF performs well when the measurements are nearly linear as is the case with satellite measurements. Base station measurements, however, have much smaller ranges, and thus the non-linearities might become a problem. The effects of nonlinearities in hybrid positioning have been studied in [10]. It was shown that the non-linearities are insignificant in satellite measurements, but often significant in base station measurements, and that EKF easily veers away from the true trac and gets stuc in an incorrect solution branch when using base station measurements.

CHAPTER 2. PRELIMINARIES 11 2.4 M-estimators In this section, M-estimators are discussed. First, some basic concepts of estimation are introduced and Huber s minimax approach is considered. Then, the minimax ideology is applied to the ǫ-contaminated normal neighborhood and the p-point family, and the most robust M-estimators for these two classes of densities are given. Finally, also other M-estimators are proposed. Estimation, in general, means the calculation of some numerical quantity which is related to a stochastic variable. The calculation is based on realizations of the stochastic variable. The estimated quantity may be a parameter describing the shape or location of the pdf of the stochastic variable, but it can also be, for example, the probability of some event related to the stochastic variable. The estimation is called point estimation if only one parameter vector is desired as an estimate. Otherwise it is called confidence interval estimation. It is assumed that the reader is familiar with the basic concepts of estimation theory such as the estimator (denoted by ˆθ) and the estimate (denoted by ˆθ). Probably the best nown estimation method is the least squares method, which was discovered by Gauss in 1795 and independently by Legendre in 1805. It is based on the idea of minimizing the sum of the squared errors, that is, to choose the unnown parameters so that the sum of the squares of the differences between the observed and the estimated values is minimized. The problem can be formulated symbolically as ˆθ LS = argmin n θ i=1 (x i θ) 2. The minimum is attained by the sample mean ˆθ LS = 1 n n i=1 x i which is unfortunately quite sensitive to outliers, i.e. observations that are far away from the bul of the data and thus regarded as corrupted or blunder. The least squares estimation performs well when the data is normally distributed, and actually, the maximum lielihood estimator for normally distributed data is the least squares estimator. According to Huber [3], Gauss was fully aware that his main reason for assuming an underlying normal density and a quadratic loss function was computational convenience. Later on this was often forgotten partly because of the central limit theorem and little was done to improve the theory of estimation. The question of inadequacy of the normal density was brought up only in the early sixties by Tuey et al. [11]. They examined what happens when the true density deviates slightly from the assumed normal density. Although many densities occuring in practice are approximately normal, slight deviations from normality occur quite

CHAPTER 2. PRELIMINARIES 12 often and drastically damage the performance of the sample mean estimator. Therefore robust methods, i.e. methods that are not so sensitive to outliers, had to be developed. Nowadays, there are quite a few well-nown methods to mae the estimator more robust. Maybe the oldest idea to obtain robustness is to reject extreme outliers entirely. According to Hampel et al. [12], estimators with this property go bac at least to Daniel Bernoulli in 1769. One way to obtain robustness would be to first reject some observations considered as outliers and then run a least squares fit. The problem is the identification of outliers, which may not be an issue when the number of observations is large and the amount of outlier observations small. However, in some applications the amount of observations may be small or the estimation process itself may be more complicated. In these cases, the rejection of outliers may not be possible or reasonable. Another reason for not rejecting the outliers entirely is that one might have some nowledge of what causes the outlier observations, and thereby, with proper weighting or correction of the observations the little information the corrupted observations contain could be used. Tuey et al. [11] proposed several robust estimators, for example, trimmed means and winsorized means. However, the general theory of robust estimation bided its time until Huber [3] published his famous paper on robust estimation in 1964, and introduced the concept of M-estimators. Definition 2.9 (M-estimator). Let x : Ω R be a stochastic variable and θ R p an unnown parameter. Let X n be a sample from x and X n = (x 1,...,x n ) the realization of X n. An M-estimator ˆθ is defined by the value ˆθ R p that minimizes n i=1 ρ(x i, ˆθ), where ρ : X n R p R is a convex function. Since in this wor the interest is on the location parameter of a pdf, and there is an underlying measurement model, the M-estimator is defined by a convex ρ such that n i=1 ρ(x i h(ˆθ)) is minimized. This can be symbolically formulated as ˆθ = arg min θ R p n ρ(x i h(θ)), and for the M-estimate ˆθ and a certain realisation X n it holds that ˆθ = arg min θ R p i=1 n ρ(x i h(θ)). (2.12) i=1

CHAPTER 2. PRELIMINARIES 13 It is required that ρ(t) is decreasing when t (, 0) and increasing when t [0, ). If ρ is differentiable, the minimum can be found by setting the gradient of θ n i=1 ρ(x i h(θ)) to zero. Define ψ(x i h(θ)) = θ ρ(x i h(θ)) = θ 1 ρ(x i h(θ)). θ p ρ(x i h(θ)) Now the M-estimator is defined by a function ψ : X n R p R p as the solution of the vector equation n ψ(x i h(θ)) = 0. i=1 If an M-estimator is defined by a function ψ, it is called an M-estimator of ψ-type. If it is defined by a function ρ which is not differentiable, it is called an M-estimator of ρ-type. The functions ρ and ψ are called the score function and the ψ-function of the M-estimator, respectively.. M-estimators can be considered as generalized maximum lielihood estimators. The maximum lielihood estimator is a special case obtained by choosing ρ(x i, θ) = ln f x (x i, θ). Other familiar estimators arise from convenient choices of the ρ- function. The least squares estimator is obtained choosing ρ(x i, θ) = (x i h(θ)) 2, and ρ(x i, θ) = x i h(θ) gives the median. 2.4.1 Minimax Robustness Before formulating the minimax robustness problem, some concepts needed in the formulation of the problem are introduced. Hampel [13] considered estimators as real valued functionals for which the inputs are probability densities and the outputs are estimates. It is a valid generalization since every sample may be considered as a set of point masses. Hampel et al. [12] introduced several useful concepts related to robust statistics, and some of them are used in this wor. One of these concepts is the influence function which is defined next. Definition 2.10 (Influence Function). Let ˆθ be an M-estimator and f x the density function. Then the influence function ψ IF of ˆθ at f x is given by ψ IF (x, ˆθ, f x ) = lim t 0+ ˆθ((1 t)fx (x) + tδ x ) ˆθ(f x (x)) t in those x R where the limit exists. The operator δ x assigns mass 1 to the point x.

CHAPTER 2. PRELIMINARIES 14 Hampel [12] showed that for M-estimators of ψ-type the ψ IF may also be defined by the ψ-function ψ IF (x, ˆθ, f x ) = ψ(x, ˆθ(f x )) [ ]. ψ(y,θ) dy θ Since the interest is only on the shape of the influence function, the denominator and the minus sign are left out. Thus, the influence function may be written as θ=ˆθ ψ IF (x, ˆθ, f x ) := ψ IF (x h(ˆθ), f x ) = ψ(x h(ˆθ(f x ))), which is the same as the ψ-function of an M-estimator. Thus, the ψ-function of an M-estimator will later be referred to as influence function. In terms of the influence function, the rejection of outliers means that ψ IF vanishes outside a certain area. Indeed, if ψ IF is identically zero in some region, the contamination in those points does not have any influence at all. In case the underlying density f x is symmetric and has zero mean, the rejection point may be defined as follows. Definition 2.11 (Rejection Point). Let f x be a symmetric pdf with zero mean and let ψ IF (x h(ˆθ), f x ) be the influence function of ˆθ at f x. The rejection point ρ of ˆθ at f x is then given by ρ = inf{r ψ IF (x h(ˆθ), f x ) = 0 : x h(ˆθ) > r > 0}, or equivalently for the ψ-function of an M-estimator ρ = inf{r ψ(x h(ˆθ)) = 0 : x h(ˆθ) > r > 0}, where the absolute values and the inequalities are considered as element-wise operations. All observations farther away from the estimate h(ˆθ) than ρ are rejected completely. If the rejection of outliers is desired, ρ has to be finite. Such estimators are called redescending estimators. Huber [3] proposed a game where Nature chooses the density f x F and the statistician chooses the estimator ˆθ T. The asymptotic variance V(ˆθ, f x ) is the pay-off to the statistician, and it is defined as follows.

CHAPTER 2. PRELIMINARIES 15 Definition 2.12 (Asymptotic Variance). Let ˆθ be an M-estimator. Under certain regularity conditions n(ˆθ n θ) is asymptotically normal with asymptotic variance V (ˆθ, f x ) given as V(ˆθ, f x ) = ψ IF (x, ˆθ, f x (x)) ψ IF (x, ˆθ, f x (x)) T f x (x)dx. It can be shown that for certain classes of densities there exists a minimax solution (ˆθ 0, fx 0 ) such that min max V(ˆθ, f x ) = V(ˆθ 0, f 0 ˆθ T f x F x) = max min f x F ˆθ T V(ˆθ, f x ). The density f 0 x is called the least favorable density of class F and ˆθ 0 the minimax robust estimator which is actually the maximum lielihood estimator for the least favorable density f 0 x. Let f x (x, θ) be the pdf of x with parameter θ. Assume that x is constant. Then the lielihood function is L(θ) = L(θ, x) = f x (x, θ) and the log-lielihood function l(θ) = ln L(θ) = ln f x (x, θ). Definition 2.13 (Lielihood Score). The lielihood score is defined as s(θ) = θ l(θ) = θ ln f x (x, θ) = [ ] T ln f x (x, θ),..., ln f x (x, θ). θ 1 θ p Consider the maximum lielihood estimator for f x (x, θ) which corresponds to an M-estimator with the score function chosen as ρ(x, θ) = ln f x (x, θ). Now if f x is differentiable, there exists the ψ-function of the maximum lielihood M-estimator corresponding to ψ(x, θ) = θ ρ(x, θ) = θ ln f x (x, θ). Thus, the ψ-function is the same as the negative lielihood score of the underlying pdf. In the next section two sets of density functions, namely, the ǫ-contaminated normal neighborhood F ǫ and the p-point family F p, and their corresponding most robust M-estimators are introduced. 2.4.2 Contamination Classes The ǫ-contaminated normal neighborhood was first proposed by Huber [3] to be used in robust parameter estimation and it is defined as follows.

CHAPTER 2. PRELIMINARIES 16 Definition 2.14 (ǫ-contaminated Normal Neighborhood). The set of density functions F ǫ is called ǫ-contaminated normal neighborhood if F ǫ = {(1 ǫ)φ(x)+ǫh(x) : H S}, where S is the set of all suitably regular 1 pdfs, and 0 ǫ 1 is the nown fraction of contamination. Huber showed that the least favorable density of this class is Gaussian shaped in the middle, but has exponential tails. Later on, this density is denoted by f 0 ǫ given by f 0 ǫ (θ) = { (1 ǫ) 2π e 1 2 θ2, θ (1 ǫ) 2π e 1 2 2 θ, θ >. and is The connection between the threshold parameter and the amount of contamination ǫ is given by 2φ() 2Φ( ) = ǫ 1 ǫ [14]. Usually this equation has to be solved numerically. The influence function of f 0 ǫ is ψ Fǫ (θ) = θ ln f0 ǫ (θ) = { θ, θ sign(θ), θ >. Define the weight function ω of an M-estimator as ω(θ) = ψ(θ) θ later. The weight function for Huber s M-estimator is since it will be used { 1, θ ω Fǫ (θ) =, θ >. θ Figure 2.1 shows the influence function and the weight function for the most robust M-estimator for F ǫ, which is also nown as the Huber estimator. ψ(θ) ω(θ) 1 t t Figure 2.1: The influence and the weight function of the Huber estimator. 1 In this wor the set of suitably regular pdfs is restricted to continuous symmetrical pdfs.

CHAPTER 2. PRELIMINARIES 17 Another interesting family of densities, namely, the p-point family was used in robust parameter estimation by Martin and Masreliez [4, 5], and it is defined as follows. Definition 2.15 (p-point Family). The set of pdfs F p is called a p-point family if F p = {f y p f(θ)dθ = p/2 = Φ( y p), f symmetric and continuous at ±y p }. The inclusion of the restriction that F p contains Φ( ) is for standardization purposes, that is, to ensure that F p is in the neighborhood of the standard normal density. Masreliez et al. [5] show that the least favorable density f 0 p of F p is where K is defined by { K cos 2 θ fp(θ) 0 ( 2s = my p ), θ y p, K cos 2 ( 1 2s m )e 2Kp 1 cos 2 ( 1 2sm )(yp θ ), θ > y p yp K cos 2 θ ( )dθ = 1 p. (2.13) y p 2s m y p For each p there exists an s m that minimizes the asymptotic variance. The minimizing value s m satisfies the equation [ ( )] [ ( )] 1 1 2s m p 1 + tan 2 2s m + tan = 0 2s m 2s m and the connection between p and y p is given in the Definition 2.15. Solving (2.13) for K yields K = 1 p y p (1 + s m sin( 1 s m )). The influence function of the least favorable density of the p-point family F p is ψ Fp (θ) = θ ln f0 p(θ) = and the weight function is ω Fp (θ) = { 1 θ s my p tan( 2s my p ), θ y p, 1 s my p tan( 1 2s m ) sign(θ), θ > y p { 1 tan( θ s my pθ 2s my p ), θ y p 1 tan(. 1 s my p θ 2s m ), θ > y p Figure 2.2 shows the influence function and the weight function for the most robust M-estimator for F p, called p-point M-estimator.

y p y p t CHAPTER 2. PRELIMINARIES 18 ψ(θ) ω(θ) 1 y p y p t Figure 2.2: The influence and the weight function of the p-point M-estimator The ǫ-contaminated normal neighborhood and the p-point family are considered for robust filter design in Chapter 3. 2.4.3 Computation of the M-estimates Computing an M-estimate might require some numerical algoritms because the ρ and ψ-functions are usually non-linear. However, the minimum in (2.12) could be obtained approximately using a linear function instead. In this thesis the linear approximation is used to calculate the M-estimates, and the procedure is described next. Approximate ψ j (x i h(θ)) with ψ j(x i h(θ 0 )) (x i h(θ 0 )) j (x i h(θ)) j, where θ 0 is hopefully near the true unnown parameter θ. Usually, θ 0 is chosen to be for example the sample mean or the median. In vector form the approximation loos lie ψ(x i h(θ)) W i (x i h(θ)), where ( ψ1 (x i h(θ 0 )) W i = diag,..., ψ ) p(x i h(θ 0 )) (x i h(θ 0 )) 1 (x i h(θ 0 )) p (2.14) is called the weight matrix. Using the weight-function (2.14) may be written as W i = diag (ω 1 (x i h(θ 0 )),..., ω p (x i h(θ 0 ))). Therefore, instead of solving the non-linear equation p i=1 ψ(x i h(θ)) = 0, the approximate equation ( p p p ) W i (x i h(θ)) = 0 W i x i = W i h(θ) (2.15) i=1 i=1 i=1

CHAPTER 2. PRELIMINARIES 19 is solved. Thus, the computation of the M-estimate becomes a weighted least squares estimation problem. In general, h is a non-linear function, and the solution is quite difficult to obtain, at least analytically. In order to overcome this problem, numerical methods have to be used. In case h = H is linear, the solution is given by ( p ) 1 ( p ) θ = (H T H) 1 H T W i W i x i. i=1 i=1 In case of non-redescending M-estimators, as the Huber or the p-point estimator, the weight matrices W i are positive definite diagonal matrices, and thus non-singular. The sum of the weight matrices is then also non-singular, and the inverse exists. This method will later be used in robust filter design to incorporate M-estimates into KF. 2.4.4 Other Examples of M-estimators The influence function of an M-estimator tells how certain observations affect the estimate. Thus, it might be more reasonable to try to find an influence function that suits the purposes of the application best using heuristic methods. In addition to the already introduced M-estimators two other examples of M-estimators are provided next. Redescending M-estimators pose a difficulty in robust filter design, since the rejection of outliers might not be reasonable. Thus, redescending M-estimators are not used in this wor. Instead, new non-redescending versions are developed. The redescending versions are introduced before the modifications. Hampel s Three-parts Redescending M-estimator Hampel [12] proposed a three-parts redescending M-estimator for which the influence function ψ HA is given by t, θ 1 1 sign(θ), 1 < θ 2 ψ HA (θ) = 3 θ 1 3 2 sign(θ), 2 < θ 3 0, θ > 3, and the corresponding weight function ω HA are given by

CHAPTER 2. PRELIMINARIES 20 ω HA (θ) = 1, θ 1 1, θ 1 < θ 2 1 3 θ θ 3 2, 2 < θ 3 0, θ > 3 The threshold parameters i are sometimes called hampels after their inventor and they have to be determined with some nowledge of the data for which the M-estimator is to be used. Figure 2.3 shows the influence function and the weight function of the Hampel s three-parts redescending M-estimator. ψ(θ) ω(θ) 1 1 3 2 1 1 2 3 t 3 2 1 1 2 3 t Figure 2.3: The influence function and the weight function of the Hampel s three-parts redescending M-estimator Damped Hampel Estimator For Hampel s three-parts redescending M-estimator the modified version is called the Damped Hampel estimator (DHA) and the influence function is defined as θ, θ < 1 ψ DHA (θ) = 1 sign(θ), 1 θ < 2, 1 2 sign(θ), θ θ 2 and the weight function is given by ω DHA (θ) = 1, θ < 1 1 θ, 1 θ < 2 1 2 θ 2, θ 2. Figure 2.4 shows the influence function and the weight function of DHA.

CHAPTER 2. PRELIMINARIES 21 ψ(θ) ω(θ) 1 2 1 1 2 t 2 1 1 2 t Figure 2.4: The influence function and the weight function of the Damped Hampel M-estimator If the application requires a faster convergence to zero, the influence and weight function could be defined as θ, θ < 1 ψ DHA (θ) = 1 sign(θ), 1 θ < 2 1 2 r sign(θ), θ θ r 2 and ω DHA (θ) = 1, θ < 1 1 θ, 1 θ < 2 1 r 2 θ r+1, θ 2, respectively, where r > 1. Figure 2.5 shows the influence and the weight function for DHA with r = 2, 3, 4, 5. ψ(θ) 1 ω(θ) 1 r = 2 r = 3 r = 4 r = 5 2 1 1 2 t 1 2 1 r = 2 r = 3 r = 4 r = 5 2 1 1 2 t Figure 2.5: The influence and the weight function of the Damped Hampel M-estimator of faster convergence for r = 2, 3, 4, 5 Various different M-estimators have been developed, for example, the Tuey s biweight and Andrew s sine [12]. It is easy to invent several more, and actually,

CHAPTER 2. PRELIMINARIES 22 the best way to choose an M-estimator for a specific problem would be to empirically find the best weight-function using some suitable optimization algorithm. This however, is out of the scope of this thesis, and thus only DHA together with the Huber and the p-point M-estimators are considered in robust filter design.

Chapter 3 Robust Kalman Filtering This chapter is devoted to the robustification of KF. The ideas are based on Huber s M-estimator theory presented in Section 2.4 and the wor of Martin and Masreliez which will be presented in Section 3.1. The filter derived in Section 3.1 is called Approximate Bayesian Kalman Filter (ABKF). In Section 3.2, it will be shown that at each time step KF actually solves a least squares problem which is then formulated as an M-estimation problem. The M-estimation problem is solved as in Section 2.4.3, and the resulting filter is called Re-weighted Kalman Filter (RKF). Both methods are based on the innovation vector which plays a ey role in the KF recursions introduced in Section 2.2. A combination of ABKF and RKF is also considered in Section 3.3, and it is called the Hybrid Kalman Filter (HKF). 3.1 Approximate Bayesian Kalman Filter In this section, an approximate Bayesian filter for linear dynamic systems is presented. The filter is directly based on the Bayesian interpretation of KF presented in Section 2.2. The filter is extended to non-linear problems at the end of the section. Theorem 3.1. Let A R n n be symmetric. There exists an orthogonal square matrix Q = [q 1,...,q n ] and a diagonal matrix Λ = λ 1,..., λ n such that A = QΛQ T. 23

CHAPTER 3. ROBUST KALMAN FILTERING 24 Proof. The spectral theorem states that for every symmetric square matrix with real entries there exists an orthonormal basis q 1,...,q n of R n and numbers λ 1,...,λ n such that Aq i = λ i q i, i {1,..., n} [15, p. 313 315]. Thus, [Aq 1,...,Aq n ] = [λ 1 q 1,...,λ n q n ] A[q 1,...,q n ] = [q 1,...,q n ]Λ AQ = QΛ A = QΛQ T The diagonal elements of Λ are the eigenvalues and the columns of Q are the corresponding orthonormal eigenvectors of A. Using Theorem 3.1 the square root of a symmetric positive definite matrix is defined as follows. Definition 3.2 (Square Root of a Symmetric Positive Semidefinite Matrix). The square root of a symmetric positive semidefinite matrix A R n n is defined as A 1 2 = Q λ1,..., λ n Q T. It follows from the symmetry of A that the square root A 1 2 is also symmetric. Consider a linear transformation matrix T = ((H ˆP HT + R ) 1 ) 1 2 := (HˆP HT + R ) 1 2, where H, ˆP and R are the linear measurement function, the prior covariance matrix and the measurement covariance matrix, respectively, that appear in KF. The inverse exists and is symmetric since H ˆP HT + R is symmetric and positive definite. Recall that the mean of the innovation s is E(s ) = 0 and the covariance matrix V(s ) = H ˆP HT + R. Thus, the mean of the transformed innovation r = T s is E(r ) = 0 and the covariance matrix V(r ) = T V(s )T T = I. Now consider the posterior mean estimate of KF ˆx = x f x + (x y 1: )dx.