Spatial Analysis Clustering. Petteri Nurmi

Spatial Analysis Clustering Petteri Nurmi 24.11.2016 1

Questions How GPS measurements can be preprocessed? What different classes of spatial clustering exist? What is the difference between partitioning algorithms and density-based clustering? What is a place? How places can be detected? 24.11.2016 2

Spatial Analysis Process of inspecting geographical data with the aim of extracting useful/meaningful information Spatial data analysis process Preprocessing Cleaning the data, perform transformations (if needed) Analysis Exploratory: data is searched for models that describe it well without clear hypothesis Confirmatory: hypotheses about data are tested empirically Post-processing Cleaning noise in identified patterns Determining which of the detected patterns are meaningful 24.11.2016 3

Measurements: Sampling Three main ways to collect measurements Referred to as sampling Periodic: every x seconds Helps to save battery and reduce storage requirements E.g., car and public transportation measurements typically collected every 10 minutes (or even less often). Distance-based: every x meters/miles Continuous: as fast as possible Depends on the location system Typically around 1Hz with most systems What happens between samples? 24.11.2016 4

Interpolation Method of constructing new data within a the range of a discrete set of points Assumes two points (x 0, y 0 ) and (x 1, y 1 ) (i.e., time and sensor) value are given Linear interpolation: Effectively weighted average where weight depends on distance from the values Spline interpolation Intervals between two points modelled with low-order polynomials Polynomial pieces for intervals selected so that they fit smoothly with each other Needed for ensuring consistent spatial and/or temporal sampling rate Spatial interpolation: ensure measurements exist for every x meters Temporal interpolation: ensure measurements exist for every t seconds 24.11.2016 5

Interpolation - Example Note: map matching (covered later) can be used to ensure the interpolated measurements obey physical constraints Varying sample rate, 37 to 122 seconds Interpolated to 10s interval 24.11.2016 6

Measurements: Noise Location measurements are inherently noisy Reference point geometry Atmospheric effects Multipath effects Measurement errors (clock or reference point errors) Preprocessing attempts to reduce noise before data is being analyzed further Data cleaning: ensure quality of measurements Check the validity of the data 24.11.2016 7

Preprocessing - GPS GPS requires at least 4 satellites for estimating position (4 unknowns: 3D position + time offset) GPS uncertainty affected by range error and satellite geometry Dilution of Precision gives an estimate of the influence of satellite geometry Horizontal Dilution of Precision (HDOP) most important for applications Cold/warm start can cause outliers in measurements 24.11.2016 8

Preprocessing GPS Example RAW GPS measurements 24.11.2016 9

Preprocessing GPS Example Points with satellites < 4 removed 24.11.2016 10

Preprocessing GPS Example Points with satellites < 4 and HDOP > 6.0 removed 24.11.2016 11

Preprocessing Removing Extreme Values 24.11.2016 12

Preprocessing Other Location Techniques Similar preprocessing techniques required for other location systems GPS slightly special as parameters provided by GPS can be used to estimate magnitude of errors For most other techniques these need to be automatically detected Extreme value detection and interpolation beneficial for any location system Simple way to detect extreme values is to calculate the speed between successive measurements and to remove those that require excessive speed 24.11.2016 13

Preprocessing - Example Data from indoor localization (retail) Two potential error areas can be observed 24.11.2016 14

Spatial Clustering Clustering refers to the process of grouping similar objects into classes Points within same cluster more similar to each other than to those in other clusters Spatial clustering refers to clustering that is applied on data with a geographical component Identifying similar geographical areas, e.g., in terms of crime rate or another statistic Merging of regions with similar weather patterns 24.11.2016 15

Spatial Clustering Four main categories of algorithms Partitioning methods (e.g., K-means, K-medoids) Hierarchical methods (e.g., BIRCH) Density-based methods (e.g., DBScan) Grid-based methods (e.g., CLIQUE) Optimal technique depends on various factors Application goal Trade-off between clustering quality and speed Characteristics and dimensionality of data Amount of noise in data 24.11.2016 16

Spatial Clustering - Partitioning Algorithms Partition data into k clusters so that total deviation of points from their cluster center is minimized Parameter k determines the number of clusters, given usually beforehand Various ways to measure total deviation: Squared distance (K-Means) Posterior of data (Gaussian Mixture Models) 24.11.2016 17

Partitioning Algorithms K-Means One of the best-known clustering algorithms Iterative relocation algorithm, optimizes squared loss m i corresponds to the center of a cluster, C i is the set of points allocated to cluster i Basic structure: Initialization: generate k cluster centers according to some criterion (e.g., random selection from data) During each iteration: Allocate each point to the cluster that is closest Revise cluster centers based on the points that are assigned to the cluster Repeat until no change in values 24.11.2016 18

K-Means Algorithm guaranteed to find a local optimum of the objective function (squared loss) Sensitive to the initial choice of cluster centers Clustering typically repeated multiple times with different initial values and solution with smallest total deviation used Initial values can be determined, e.g., using Random sampling Select fraction of data, perform clustering on that, use resulting clusters as initial values Data spectroscopy: analyze spectral characteristics of data values to determine a good initial guess 24.11.2016 19

K-Means - Example 24.11.2016 20

K-Means Determining k Most common method is to examine changes in objective function as a value of k Cluster with different values of k, select the one that optimizes a selection metric KL index: measures relative change between two successive k values Cost refers to objective function, in the case of k- means sum of squares Scree plot: plot error as a function of k and select the knee or dip point Point where clear change in error Not guaranteed to exist, and often chosen heuristically based on visual inspection 24.11.2016 21

Partitioning Algorithms Probabilistic Clustering Generative: data assumed to be generated according to some model Parameters of the model unknown and need to be estimated from data Returns a probability distribution over the parameter values Two possible assignments of points to cluster Hard: each point belongs exactly to one cluster Soft: allow multiple (or all) clusters to contribute to the generation of the point 24.11.2016 22

Partitioning Algorithms Mixture Models Mixture Models provide a flexible and generic approach to probabilistic clustering Data generated by k random variables, each variable X i characterized by probability density function f i (θ i ) For each point i, a hidden and unobservable variable c i determines the cluster where i belongs to The clusters are called mixture components Probability of a point is a (convex) combination of the mixture component densities defines the weight or contribution of a component 24.11.2016 23

Partitioning Algorithms Gaussian Mixture Models Mixture model where mixture components are assumed to have a Gaussian distribution Mean μ i determines the center of the cluster Covariance matrix i determines shape of the cluster Assuming Euclidean distances: Shape is circle if variance of all dimensions is equal Shape is an ellipse aligned with coordinate axes when covariance matrix is diagonal Shape is a tilted ellipse when full covariance matrix used K-means can be understood as a Gaussian mixture model where variance is equal 24.11.2016 24

Partitioning Algorithms Gaussian Mixture Models Cluster parameters can be determined using the expectation maximization (EM) algorithm Iterative algorithm for finding optimal parameter values in models with latent (i.e., unobservable) variables Consists of two steps (E and M) which are iterated until solution converges Algorithm outline: Initialization: draw initial parameter values E-step: compute expectation of log-likelihood using current estimates M-step: compute parameters that maximize the expected log-likelihood computed in the E-step 24.11.2016 25

Partitioning Algorithms Infinite Mixture Models A generalization of mixture models where number of mixture models is assumed infinite (but countable) Example: Chinese restaurant process Customers arrive to a restaurant with an infinite number of circular tables, each having infinite capacity As new customer arrives (s)he selects the table to sit Either one of the partially occupied tables Or completely new table 24.11.2016 26

Partitioning Algorithms K-Medoids Partitioning algorithm that represents a cluster using the most centrally located measurement Instead of updating all centers during an iteration, typically updates only a single medoid How to determine the new medoid? How to evaluate effectiveness of clustering? Covered in more detail during Lecture VIII 24.11.2016 27

Density-Based Algorithms Class of algorithms that represent clusters as dense regions of objects In contrast to partitioning algorithms, can derive clusters of arbitrary shape Areas with low-density of objects are considered noise Basic concepts Epsilon neighborhood: collection of points that are within distance Eps from a point Dense neighborhood: Epsilon neighborhood that contains at least MinPts points 24.11.2016 28

Density-Based Algorithms Radius-Based Clustering Predecessor to density-based clustering Cluster all points with distance Eps of each other to the same cluster MinPts or some other criterion can be used to prune the resulting clusters 24.11.2016 29

Radius-Based Clustering Example 24.11.2016 30

Density-Based Algorithms DBScan A point that has at least MinPts within its Epsilon neighborhood is called a core object Object can only belong to a cluster if it is within the Epsilon neighborhood of at least one core object Core object o within Epsilon neighborhood of another core object p must belong to the same cluster as p Non-core object belonging to the Epsilon neighborhood of some core objects must belong to the same cluster as one of these core objects Non-core objects which do not belong to the Epsilon neighborhood of any core objects are noise 24.11.2016 31

Density-Based Algorithms DBScan Non-core object Core object Outlier / noise Core object Clusters A,B and C can be merged since they share a core object 24.11.2016 32

Density-Based Algorithms DBScan Algorithm that recursively merges Epsilon neighborhoods together to identify dense regions Let c be a core object, within the Epsilon neighborhood of c considered as seed points Cluster expanded with (previously unallocated) points that are within the Epsilon neighborhood of a seed point 24.11.2016 33

DBScan example Noise Clusters 24.11.2016 34

Density-Based Algorithms DJCluster Variant of DBScan where cluster expansion performed iteratively instead of recursively Better suited for large datasets Basic idea: Find Epsilon neighborhood of a point Assign all points within the neighborhood into cluster Check if cluster shares a core point with any of the previous clusters If so, clusters can be merged 24.11.2016 35

Notion of Place Location systems tend to provide information in coordinate form (absolute or relative) People refer to locations using semantic (or symbolic) descriptions Descriptions for the same place can vary between different people Place Representation of location that is consistent with the way people communicate location information 24.11.2016 36

Notion of Place Monastery Petra, Jordan Church Royal Tombs Hotel Treasury Ticket Office 24.11.2016 37

Notion of Place Definitions for place originate from the field of humanistic geography Roots in phenomenology and philosophy Especially philosophy of Martin Heidegger Places entities that relate physical locations with human experiences and meanings Relph: places physical locations that are linked with meanings and activities Tuan: places are spaces (i.e., physical locations) that are embodied with meanings 24.11.2016 38

Notion of Place The meanings attributed to places vary: Activities: swimming hall, movie theater, gym Social: friend s home, regular place to meet friends Generic: library, grocery store, train station Multiple meanings can be attributed to a place Relate to different activities (and times) at the place Places can be perceived as public or private Note: space can be public even if place is private! Depends on the activity, time of day etc. Influences preferences regarding location disclosure 24.11.2016 39

Why place matters? Personalized information delivery E.g., associate notes/to-do lists with places Select advertisements or other information to provide E.g., provide train or bus schedules Depends on stability of information and familiarity of place Awareness cue Places often a cue of activity and availability Automated status messages, e.g., in phone contact list Support user studies Differentiating meaningful situations in analysis phase 24.11.2016 40

Detecting places Locations correlate strongly with activities What are you doing? often answered with location during mobile phone calls People assign activity-related labels to places Places correlate with time Humans spend the majority of time in a few places Probability of labeling a place increases with time But traffic stops (traffic jams, traffic lights) seldom labeled èplaces can be detected from location traces Activity information can help (if available) 24.11.2016 41

Place Identification Place Identification = the process of detecting places from data A data analysis step with four steps Preparation: clean data, transform data Preprocessing: making data ready for analysis Analysis: performing the actual analysis Post-processing: refining the results Additionally a labeling step Assign semantics with the detected places Can take place before or after analysis 24.11.2016 42

Labeling Common choice is to prompt the user to label a place after it has been detected Alternative to label first and learn the places automatically based on the labels Some labels can be assigned automatically Geographic databases can be used to mine information about the type of building Time information can be used to identify home and workplace Different modalities: text, photo, photo + text 24.11.2016 43

Detecting Places Overview Most place detection algorithms operate on coordinate data Pruning: remove measurements that are unlikely to be meaningful Clustering: apply spatial clustering on the data Post-processing: determine which clusters are likely to correspond to meaningful places Spatial criteria: matching against Geo-databases, considering size of clusters etc. Temporal criteria: requiring a minimum stay duration 24.11.2016 44

Detecting Places Velocity Pruning Measurements where the user is moving are unlikely to correspond to significant places Velocity can be used to prune measurements and clustering applied on remaining data 24.11.2016 45

Place Detection Further Topics Coordinate algorithms unable to separate between different places within the same indoor space Radio fingerprinting based place detection uses stability of signal environment to detect places Current state-of-the-art in mobile phone based place detection Performance decreases in areas with limited signal environment Hybrid algorithms Combine coordinate-based techniques with radio fingerprinting based place detection 24.11.2016 46

Fingerprint-based Place Detection Basic idea is to compare similarity of fingerprint information over time If radio environment sufficiently similar, over a time window t, the user is assumed to be a in a place Many possible ways to measure similarity of RF environments Rank Correlation (NearMe) Extended Tanimoto (SensLoc) Normalized Euclidean distance 24.11.2016 47

Fingerprint-based Place Detection - Example Mac address: 1 2 A. -82-74 B. -84-79 C. -40-40 Consider the data on the left: ExtTanimoto(A,B) = (-82 * -84 + -74 * -79) / (82^2 + 74^2 + 84^2 + 79^2 - (-82 * -84 + -74 * -79)) = 0.9977 ExtTanimoto(A,C) = 0.68 A and B from same location with high probability, C likely from a different location If we get successive similar measurements for, e.g., 5 minutes or 10 minutes, we are assumed to be in a place 24.11.2016 48

Case Study: Zero Interaction Authentication (ZIA) B A Fingerprint similarity generic tool that has many other applications, as an example we consider ZIA Assume device B unlocks automatically whenever device A is in close proximity (zero user interaction) Car locks Token -based authentication for laptops / terminals Susceptible to relay attacks where another device pretends to be A If A and B compare their WiFi environments, the similarity of these environments can be used to resist against relay attacks 24.11.2016 49

Summary Spatial analysis refers to the process of inspecting geographical data Preprocessing: cleaning and preparing data for analysis Analysis: exploratory or confirmatory Post-processing: validating, pruning results Spatial clustering Grouping of similar (spatial) objects together Partitioning algorithms: divide data optimally to clusters Density-based algorithms: identify dense spatial regions 24.11.2016 50

Summary Place Representation of location that is consistent with the way people communicate location information Semantic / symbolic Place detection Process of identifying places from location measurements On coordinate data, can be solved using spatial clustering and temporal + spatial pruning 24.11.2016 51

Literature Ester, M.; Kriegel, H.-P.; Sander, J. & Xu, X., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), AAAI, 1996, 226-231 Sander, J.; Ester, M.; Kriegel, H.-P. & Xu, X., Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications, Data Mining and Knowledge Discovery, 1998, 2, 169-194 Zhou, C.; Frankowski, D.; Ludford, P.; Shekhar, S. & Terveen, L., Discovering Personally Meaningful Places: An Interactive Clustering Approach, ACM Transactions on Information Systems, 2007, 25, 12 Ashbrook, D. & Starner, T., Learning significant locations and predicting user movement with GPS, Proceedings of the 6th International Symposium on Wearable Computers (ISWC), IEEE, 2002, 101-108 Kang, J.; Welbourne, W.; Stewart, B. & Borriello, G., Extracting places from traces of locations, Proceedings of the 2nd ACM international workshop on Wireless mobile applications and services on WLAN hotspots (WMASH), ACM Press, 2004, 110-118 24.11.2016 52

Literature Liao, L.; Fox, D. & Kautz, H., Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields, International Journal of Robotics Research, 2007, 26, 119-134 Marmasse, N. & Schmandt, C., A user-centered location model, Personal and Ubiquitous Computing, 2002, 6, 318-321 Nurmi, P. & Bhattacharya, S., Identifying Meaningful Places: The Nonparametric Way, Proceedings of the 6th International Conference on Pervasive Computing (Pervasive), Springer, 2008, 5013, 111-127 Tuan, Y.-F., Space and Place: The Perspective of Experience, University of Minnesota Press, 2001 Relph, E., Place and Placelessness, Pion Books, 1976 Han, J.; Kambar, M. & Tung, A. K. H., Spatial Clustering Methods in Data Mining: A Survey, Geographic Data Mining and Knowledge Discovery, Taylor & Francis, 2001 24.11.2016 53

Literature Kim, D. H.; Kim, Y.; Estrin, D. & Srivastava, M. B. SensLoc: sensing everyday places and paths using less energy, Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems (SenSys), ACM, 2010, 43-56 Hightower, J.; Consolvo, S.; LaMarca, A.; Smith, I. & Hughes, J. Learning and Recognizing the Places We Go, Proceedings of the 7th International Conference on Ubiquitous Computing (UBICOMP), Springer-Verlag, 2005, 3660, 159-176 Truong, H. T. T.; Gao, X.; Shrestha, B.; Saxena, N.; Asokan, N. & Nurmi, P. Comparing and Fusing Different Sensor Modalities for Relay Attack Resistance in Zero-Interaction Authentication, Proceedings of the 12th International Conference on Pervasive Computing and Communications (PerCom), 2014 24.11.2016 54