Mobile Sensing III Signal Processing for Sensor Data Analysis. Petteri Nurmi Spring 2015

Samankaltaiset tiedostot
Efficiency change over time

Gap-filling methods for CH 4 data

Alternatives to the DFT

Capacity Utilization

Other approaches to restrict multipliers

E80. Data Uncertainty, Data Fitting, Error Propagation. Jan. 23, 2014 Jon Roberts. Experimental Engineering

Bounds on non-surjective cellular automata

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

7.4 Variability management

Alternative DEA Models

Valuation of Asian Quanto- Basket Options

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

LYTH-CONS CONSISTENCY TRANSMITTER

16. Allocation Models

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

T Statistical Natural Language Processing Answers 6 Collocations Version 1.0

The CCR Model and Production Correspondence

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Results on the new polydrug use questions in the Finnish TDI data

Statistical design. Tuomas Selander

C++11 seminaari, kevät Johannes Koskinen

Mobile Sensing VII Audio Sensing. Spring 2015 Petteri Nurmi

7. Product-line architectures

I. Principles of Pointer Year Analysis

Characterization of clay using x-ray and neutron scattering at the University of Helsinki and ILL

Network to Get Work. Tehtäviä opiskelijoille Assignments for students.

Data Quality Master Data Management

The Viking Battle - Part Version: Finnish

Use of spatial data in the new production environment and in a data warehouse

Metsälamminkankaan tuulivoimapuiston osayleiskaava

Mobile Sensing V Motion Analysis. Spring 2015 Petteri Nurmi

Information on preparing Presentation

( ( OX2 Perkkiö. Rakennuskanta. Varjostus. 9 x N131 x HH145

TM ETRS-TM35FIN-ETRS89 WTG

Innovative and responsible public procurement Urban Agenda kumppanuusryhmä. public-procurement

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward.

Tynnyrivaara, OX2 Tuulivoimahanke. ( Layout 9 x N131 x HH145. Rakennukset Asuinrakennus Lomarakennus 9 x N131 x HH145 Varjostus 1 h/a 8 h/a 20 h/a

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ

3 9-VUOTIAIDEN LASTEN SUORIUTUMINEN BOSTONIN NIMENTÄTESTISTÄ

Supplementary Table S1. Material list (a) Parameters Sal to Str

TM ETRS-TM35FIN-ETRS89 WTG

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

Capacity utilization

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

Kvanttilaskenta - 1. tehtävät

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4)

TM ETRS-TM35FIN-ETRS89 WTG

Exercise 1. (session: )

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

GOOD WORK LONGER CAREER:

Oma sininen meresi (Finnish Edition)

S Sähkön jakelu ja markkinat S Electricity Distribution and Markets

WindPRO version joulu 2012 Printed/Page :47 / 1. SHADOW - Main Result

WindPRO version joulu 2012 Printed/Page :42 / 1. SHADOW - Main Result

anna minun kertoa let me tell you

( ,5 1 1,5 2 km

,0 Yes ,0 120, ,8

Paikkatiedon semanttinen mallinnus, integrointi ja julkaiseminen Case Suomalainen ajallinen paikkaontologia SAPO

DIGITAL MARKETING LANDSCAPE. Maatalous-metsätieteellinen tiedekunta

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

FYSE301(Elektroniikka(1(A3osa,(kevät(2013(

Tilausvahvistus. Anttolan Urheilijat HENNA-RIIKKA HAIKONEN KUMMANNIEMENTIE 5 B RAHULA. Anttolan Urheilijat

Choose Finland-Helsinki Valitse Finland-Helsinki

HITSAUKSEN TUOTTAVUUSRATKAISUT

TM ETRS-TM35FIN-ETRS89 WTG

Tracking and Filtering. Petteri Nurmi

Informaatioteknologia vaikuttaa ihmisten käyttäytymiseen ja asenteisiin

Curriculum. Gym card

Windows Phone. Module Descriptions. Opiframe Oy puh Espoo

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. David R. Kelley

Särmäystyökalut kuvasto Press brake tools catalogue

1.3 Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

812336A C++ -kielen perusteet,

Frequencies. Frequency Table

SIMULINK S-funktiot. SIMULINK S-funktiot

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana

toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous

Rotarypiiri 1420 Piiriapurahoista myönnettävät stipendit

ELEMET- MOCASTRO. Effect of grain size on A 3 temperatures in C-Mn and low alloyed steels - Gleeble tests and predictions. Period

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007

RINNAKKAINEN OHJELMOINTI A,

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) ( (Finnish Edition)

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo

Operatioanalyysi 2011, Harjoitus 2, viikko 38

Keskeisiä näkökulmia RCE-verkoston rakentamisessa Central viewpoints to consider when constructing RCE

Students Experiences of Workplace Learning Marja Samppala, Med, doctoral student

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

Guidebook for Multicultural TUT Users

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL

Returns to Scale Chapters

Helsinki Metropolitan Area Council

I. AES Rijndael. Rijndael - Internal Structure

Tracking and Filtering. Petteri Nurmi

AYYE 9/ HOUSING POLICY

Co-Design Yhteissuunnittelu

Transkriptio:

Mobile Sensing III Signal Processing for Sensor Data Analysis Petteri Nurmi Spring 2015 17.3.2015 1

Learning Objectives Understand the basics of time and frequency domain representations of signals Understand what are the different phases of sensor data analysis cycle Learn most common preprocessing techniques Learn most common types of features Understand how to measure similarity of sensor measurements 17.3.2015 2

Signals In sensor analysis, signals are typically functions of time and multidimensional Example: 3D accelerometer data x y note: coordinate axes relative to derive orientation of device z time 17.3.2015 3

Signals: Sampling Continuous signals cannot be collected or stored, so instead we operate on (discrete) samples Sampling interval Hz, defines how often samples taken Example of samples taken within 1s (Magnetometer 95hz) 17.3.2015 4

Signals: Noise In practice, measurements noisy versions of the true underlying signal Linear equation: z i = x i + v i where v i is noise, z i is the observed measurement, and x i is the true signal value Often assumed to be Gaussian, zero-mean, and independent identically distributed (i.i.d.) 17.3.2015 5

Frames Typically measurements are processed in frames Sequence of successive measurements Frame width refers to the number of measurements included in frame Sampling rate not necessarily uniform è use time instead Overlap of a frame refers to the fraction of measurements it has common with previous frame Example: No-overlap 50% overlap 17.3.2015 6

Sample Synchronization Sample 1: A: 1320133789651 M:1320133789652 Sample 2: A: 1320133789663 M: 1320133789664 Measurements from different sensors often arrive at different times è need to synchronize sample Consider, e.g., the example on the left with two samples from accelerometer and magnetometer Possible ways to achieve sample synchronization: 1. Take time indices from the sensor with highest sampling rate and align measurements from other sensor with the closest timestamp 2. Fix uniform time window size and interpolate values at specific times Computationally heavier, but recommended particularly when sampling rate of a sensor is non-uniform 17.3.2015 7

Interpolation Method of constructing new data within a the range of a discrete set of points Assumes two points (x 0, y 0 ) and (x 1, y 1 ) (i.e., time and sensor) value are given Linear interpolation: Effectively weighted average where weight depends on distance from the values Spline interpolation Intervals between two points modelled with low-order polynomials Polynomial pieces for intervals selected so that they fit smoothly with each other 17.3.2015 8

Interpolation: Example 0-1.59 1-1.32 14-1.05 23-0.98 33-0.94 Consider the data given in the table on the left. What is the value of the signal at t = 10? Linear interpolation: -1.13 Y0 = -1.32, Y1 = -1.05, X0 = 1, X1 = 14, X = 10 Equivalently: 4/13 * Y0 + 9/13 * Y1 Cubic interpolation: -0.83 Cubic interpolation Original data 17.3.2015 9

Time vs. Frequency Domain Thus far we have looked at time-domain representation of signals Time on x-axis, value of signal on y-axis Alternative is to look at frequency domain of signal So-called spectral analysis Function decomposed into sinusoid components Analysis looks at the frequency, phase and amplitude of these components 17.3.2015 10

Fourier Transform Transformation that decomposes a function f(x) into frequencies that make it up Captures the energy of the signal within each frequency When x is time (seconds), the output represents frequency (in hertz) Essential for capturing periodicity in signals Invertible transform, i.e., the original signal can be recovered from a transformed signal Fourier Transform Inverse Fourier Transform: 17.3.2015 11

Discrete Fourier Transform (DFT) Fourier transform for discrete and periodic signals Discrete approximation of Fourier transform that estimates the transform at N possible output values Integral in Fourier transform estimated as discrete sum Each X n is a complex number Effectively converts the signal into a sum of cosine and sine waves Inverse Discrete Fourier Transform Reverse operation that recovers the original signal Fast Fourier Transform (FFT) Algorithm for calculating the DFT and its inverse DFT O(N 2 ), FFT O(n log n) 17.3.2015 12

Computing FFT: Cooley-Tukey Danielson-Lanczos lemma: Discrete Fourier transform of length N can be written as sum of two discrete Fourier transforms of N/2 One formed from even, the other from odd points Can be applied recursively to further divide computations Cooley-Tukey algorithm: Implementation of FFT that builds on the Danielson- Lanczos lemma Practical implementations rely on so-called bit-reversal ordering of elements to significantly speed up calculations 17.3.2015 13

Illustration of FFT Data 10.9 14.9 17.0 16.1 16.4 19.1 28.0 20.3 17.8 19.0 179.50-38.37 25.74-5.69 0.10 7.44-1.14 9.01 0.70 9.01-1.14 7.44 0.10-5.69 25.74-38.37 0.00-86.94 3.50-16.92-16.60 13.92-18.50 9.50 0.00-9.50 18.50-13.92 16.60 16.92-3.50 86.94 + i DC Complex conjugates signal FFT IFFT(FFT(signal)) 17.3.2015 14

Sensor Data Analysis Pipeline (Revisited from Lecture I) Raw sensor data High-level contexts: Walking, Running, Vacuuming, Driving, Sensor data extraction Frames, synchronization Preprocessing Smoothing, noise removal, transformation Feature extraction Extracting time domain and frequency domain featurse from measurements Inference: classification/regression 17.3.2015 15

Phase I: Preprocessing Preprocessing refers to steps that are carried out on the data before analysing them Filtering: removing specific sources of noise Smoothing: reducing spikes in the measurements with the aim to capture the main trend better Windowing: improving frequency domain representation Discussed during a later lecture Possible data transformations Some applications operate on alternative signal representations, e.g., empirical ECDF in activity recognition and hyperbolic signals in WiFi 17.3.2015 16

Preprocessing: Filter Types Noise removal often done by removing measurements falling into a specific frequency range Low-pass filter: Signals with a frequency lower than threshold allowed to pass, higher frequencies cut-off High-pass filter: Signals with a frequency higher than threshold allowed to pass, lower frequencies cut-off Band-bass filter: Signals within a given frequency range are allowed to pass, others discarded Several ways to implement FFT-based: cut of frequencies outside given range Windowed filters, allow smoother form for result signal 17.3.2015 17

Example: FFT preprocessing BEFORE Original data After filtering AFTER Time Domain Freque ncy Domain FFT of original data FFT of filtered data 17.3.2015 18

Smoothing: Mean and Median Filters Smoothing approximates original signal by reducing short-term fluctuations Mean and median filters common ways to implement smoothing Set a window around current sample, [i w, i + w] where w is width of the filter Replace the value of the sample with the mean/median of the window Median more robust to outliers But mean can be calculated online 17.3.2015 19

Mean and Median Filters: Example Mean filter Median filter 17.3.2015 20

Phase II: Feature Extraction Feature extraction: identifying and extracting key characteristics of the signals / frames Typically during design phase as many features as possible extracted and the most useful ones selected for final sensing application Two main feature categories Time domain: features related to summary and order statistic Frequency domain: spectral characteristics, extracted from FFT transformed signals Necessary evil, even the best classifier cannot reach high accuracy if the features are not discriminative! 17.3.2015 21

Descriptive Statistics Location (central tendency) arithmetic mean, median, mode Spread (statistical dispersion) Variance/standard deviation, min, max, range Interquartile range (IQR) IQR: difference between 75 th and 25 th percentile Requires ordering the measurements Distribution-related features Shape parameters: skewness (bias to right or left in distribution), kurtosis (peakedness of distribution) 17.3.2015 22

Example Data 10.9 14.9 17.0 16.1 16.4 19.1 28.0 20.3 17.8 19.0 Location Mean = 18.0 Median = 17.4 Mean influenced more by the extreme value 28 Spread: St. Deviation = 4.5, Max = 28, Min = 10.9, range = 17.1 IQR: 25 th percentile 16.1, 75 th percentile 19.1 è IQR = 3 Distribution Skewness = 0.8 Kurtosis = 4.0 17.3.2015 23

Other statistical features Sample difference (derivative): Δ(x i+1 x i ) Effectively the rate of change in measurements Useful for identifying peaks in signals Zero/mean crossing rate Number of times the signal passes through zero/mean Can be used, e.g., to identify events with symmetric changes in magnitude Magnitude (or L1-norm): L1-n x i Root Mean Square Error (RMS): ( x i 2 ) / n 17.3.2015 24

Example How many zero-crossings? How many mean-crossings? 8 13 17.3.2015 25

Frequency Domain Features Frequency domain features extracted from the FFT output of a signal Discrete Component (DC) Spectral Energy Squared sum of spectral coefficients normalized by length of sample window FFT coefficients / sum of FFT coefficients Energy at a given frequency, or at given set of frequencies Dominant frequency I.e., frequency with highest energy content 17.3.2015 26

Examples Original (speech) signal Energy (frame size = 100) 17.3.2015 27

Other classes of features Wavelets Alternative to FFT, represents signal as a function of Wavelet base functions Several different transformations, e.g., Haar and Daubechies Contrary to FFT, lacks a physical interpretation String features Quantize values to a discrete set of possible ones, e.g., using piecewise approximations Assign a character for each possible level è signals can be transformed into strings String similarity metrics can be used as features 17.3.2015 28

Phase III: Inference As discussed, the final phase refers to inference, where the goal is to estimate a given output Discrete categories è classification Numeric outputs/scores è regression Common classification techniques Naive Bayes classifier, Support Vector Machines (SVM) Decision Trees, knn classifiers Common regression techniques Support Vector Regression, Gaussian processes, linear regression 17.3.2015 29

Measuring Similarity In many applications, features correspond to the similarity of two (or more) signals Applications that compare signals across devices, e.g., proximity sensing, authentication, and social sensing Moving object databases literature has applications that require finding similar movement traces Comparison of users in participatory and persuasive sensing applications Nearest neighbour classifiers rely on a similarity measure for signals (as discussed) Several possible similarity measures, best choice often application dependent 17.3.2015 30

Correlation Continuous data Ordinal (ranked) and skewed data Measure of dependency between two signals Can be used as measure of similarity Pearson product-moment correlation Spearman rank correlation Converts data into ranks and measures the difference between ranks d i Kendall s tau 17.3.2015 31

Autocorrelation and Cross- Correlation Cross-correlation Similarity of two signals as a function of the lag of one relative to other Useful for identifying signals that are copies of each other but shifted in time Can be used, e.g., to synchronize two audio streams by using lag with maximal auto-corr. Autocorrelation Cross-correlation of a signal with itself Measures correlation of signal values with itself Useful for finding periodic patterns, e.g., step counting can be autocorrelation to identify gait cycle 17.3.2015 32

Dynamic Time Warping Elastic similarity measure that calculates optimal match between two time series Allows warping in time-dimension, i.e., measurements can stretch or compress (e.g., due to different speed) Most widely used time-series similarity Does not satisfy triangle inequality è not a metric Calculated using dynamic programming Closely related to edit distance, but uses dynamic distance as penalty instead of a constant 17.3.2015 33

Dynamic Time Warping: Illustration A B 17.3.2015 34

Dynamic Time Warping: Calculating Initialization: DTW(0,0) = 0, DTW(0,i) = DTW(i,0) = Iterate over cost matrix, setting the value to: where e.g., Euclidean 17.3.2015 35

Dynamic Time Warping: Example 0 1 2 3 4 5 6 7 8 0 0 Signal 1-1.59-1.32-1.05-0.98-0.94-0.89-0.31-0.31-0.19-0.83 Signal 2 6.05 5.72 5.71 5.97 6.25 6.25 5.30 5.30 1 7.6 15 22.3 29.8 37.7 45.5 52.4 59.3 2 15.0 14.7 21.7 29.0 36.6 44.1 50.8 57.4 3 22.1 21.5 21.4 28.5 35.8 43.1 49.4 55.7 4 29.1 28.2 28.1 28.4 35.6 42.8 49.1 55.4 5 36.1 34.8 34.8 35 35.6 42.8 49 55.2 6 43.1 41.4 41.4 41.6 42.2 42.7 48.9 55.1 7 49.4 47.5 47.4 47.6 48.2 48.7 48.3 53.9 8 55.8 53.5 53.4 53.7 54.2 54.8 53.9 53.9 9 62.0 59.4 59.3 59.6 60.1 60.7 59.4 59.4 10 68.9 65.9 65.8 66.1 66.6 67.2 65.6 65.5 17.3.2015 36

Final Note: System Level View As discussed during Lecture II, resource-efficiency a crucial design goal for mobile sensing systems Frequency-domain analysis often quoted as energy heavy, but not really Depends heavily on the FFT implementation Distribution costs of all sensor data often MUCH higher than doing the processing on the device Mainly beneficial to offload heavy classifiers, and especially their training phase Big-data frameworks (Hadoop, Spark,...) Currently mainly useful for analysing large small data, e.g., small sets of features collected from many devices 17.3.2015 37

Summary Signals, discrete samples, noise Signals can be analyzed in time or frequency domain (i)fft used to convert between time and frequency domain representations Preprocessing techniques needed to overcome noise in measurements Smoothing, filtering Feature extraction Summary and order statistic features Frequency domain features Similarity another important aspect 17.3.2015 38

References Figo, D.; Diniz, P.; Ferreira, D. & Cardoso, J. Preprocessing techniques for context recognition from accelerometer data, Personal and Ubiquitous Computing, 2010, 14-7, 645-662 Krumm, J., Processing Sequential Sensor Data, Ubiquitous Computing Fundamentals, CRC Press, 2009, 353-380 Keogh, E. & Ratanamahatana, C. A., Exact Indexing of Dynamic Time Warping, Knowledge and Information Systems, 2004, 7, 358-386 Additional reading: ANY book on Digital Signal Processing Smith, S. W., Digital Signal Processing, Newnes, 2003 AND the following HIGHLY RECOMMENDED book: Press, W. H.; Teukolsky, S. A.; Vetterling, W. T. & Flannery, B. P., Numerical Recipes: The Art of Scientific Computing, Cambridge University Press, 2007 17.3.2015 39