Spatial Analysis Clustering Petteri Nurmi

Spatial Analysis Clustering Petteri Nurmi 2.2.2012 1

Questions What kind of preprocessing steps are useful for GPS measurements? What different classes of spatial clustering exist? What is the difference between partitioning algorithms and density-based clustering? What is a place? How places can be detected? 2.2.2012 2

Spatial Analysis Process of inspecting geographical data with the aim of extracting useful information Spatial data analysis process Preprocessing Cleaning the data, perform transformations (if needed) Analysis Exploratory: data is searched for models that describe it well without clear hypothesis Confirmatory: hypotheses about data are tested empirically Post-processing 2.2.2012 3

Measurement Noise Location measurements are inherently noisy Reference point geometry Atmospheric effects Multipath effects Measurement errors (clock or reference point errors) See Lecture IV Preprocessing attempts to reduce noise before data is being analyzed further Data cleaning: ensure quality of measurements Check the validity of the data 2.2.2012 4

Preprocessing - GPS GPS requires at least 4 satellites for estimating position (4 unknowns: 3D position + time offset) GPS uncertainty affected by range error and satellite geometry Dilution of Precision gives an estimate of the influence of satellite geometry Horizontal Dilution of Precision (HDOP) most important for applications Cold/warm start can cause outliers in measurements 2.2.2012 5

Preprocessing GPS Example RAW GPS measurements 2.2.2012 6

Preprocessing GPS Example Points with satellites < 4 removed 2.2.2012 7

Preprocessing GPS Example Points with satellites < 4 and HDOP > 6.0 removed 2.2.2012 8

Preprocessing Removing Extreme Values 2.2.2012 9

Spatial Clustering Clustering refers to the process of grouping similar objects into classes Points within same cluster more similar to each other than to those in other clusters Spatial clustering refers to clustering that is applied on data with a geographical component Identifying similar geographical areas, e.g., in terms of crime rate or another statistic Merging of regions with similar weather patterns 2.2.2012 10

Spatial Clustering Four main categories of algorithms Partitioning methods (e.g., K-means, K-medoids) Hierarchical methods (e.g., BIRCH) Density-based methods (e.g., DBScan) Grid-based methods (e.g., CLIQUE) Optimal technique depends on various factors Application goal Trade-off between clustering quality and speed Characteristics and dimensionality of data Amount of noise in data 2.2.2012 11

Spatial Clustering - Partitioning Algorithms Partition data into k clusters so that total deviation of points from their cluster center is minimized Parameter k determines the number of clusters, given usually beforehand Various ways to measure total deviation: Squared distance (K-Means) Posterior of data (Gaussian Mixture Models) 2.2.2012 12

Partitioning Algorithms K-Means One of the best-known clustering algorithms Iterative relocation algorithm, optimizes squared loss m i corresponds to the center of a cluster, C i is the set of points allocated to cluster i Basic structure: Initialization: generate k cluster centers according to some criterion (e.g., random selection from data) During each iteration: Allocate each point to the cluster that is closest Revise cluster centers based on the points that are assigned to the cluster Repeat until no change in values 2.2.2012 13

K-Means Algorithm guaranteed to find a local optimum of the objective function (squared loss) Sensitive to the initial choice of cluster centers Clustering typically repeated multiple times with different initial values and solution with smallest total deviation used Initial values can be determined, e.g., using Random sampling Select fraction of data, perform clustering on that, use resulting clusters as initial values Data spectroscopy: analyze spectral characteristics of data values to determine a good initial guess 2.2.2012 14

K-Means - Example 2.2.2012 15

K-Means Example 2.2.2012 16

Partitioning Algorithms Probabilistic Clustering Generative: data assumed to be generated according to some model Parameters of the model unknown and need to be estimated from data Returns a probability distribution over the parameter values Two possible assignments of points to cluster Hard: each point belongs exactly to one cluster Soft: allow multiple (or all) clusters to contribute to the generation of the point 2.2.2012 17

Partitioning Algorithms Mixture Models Mixture Models provide a flexible and generic approach to probabilistic clustering Data generated by k random variables, each variable X i characterized by probability density function f i ( i ) For each point i, a hidden and unobservable variable c i determines the cluster where i belongs to The clusters are called mixture components Probability of a point is a (convex) combination of the mixture component densities defines the weight or contribution of a component 2.2.2012 18

Partitioning Algorithms Gaussian Mixture Models Mixture model where mixture components are assumed to have a Gaussian distribution Mean i determines the center of the cluster Covariance matrix i determines shape of the cluster Assuming Euclidean distances: Shape is circle if variance of all dimensions is equal Shape is an ellipse aligned with coordinate axes when covariance matrix is diagonal Shape is a tilted ellipse when full covariance matrix used K-means can be understood as a Gaussian mixture model where variance is equal 2.2.2012 19

Partitioning Algorithms Gaussian Mixture Models Cluster parameters can be determines using the expectation maximization (EM) algorithm Iterative algorithm for finding optimal parameter values in models with latent (i.e., unobservable) variables Consists of two steps (E and M) which are iterated until solution converges Algorithm outline: Initialization: draw initial parameter values E-step: compute expectation of log-likelihood using current estimates M-step: compute parameters that maximize the expected log-likelihood computed in the E-step 2.2.2012 20

Partitioning Algorithms Infinite Mixture Models A generalization of mixture models where number of mixture models is assumed infinite (but countable) Example: Chinese restaurant process Customers arrive to a restaurant with an infinite number of circular tables, each having infinite capacity As new customer arrives (s)he selects the table to sit Either one of the partially occupied tables Or completely new table 2.2.2012 21

Partitioning Algorithms K-Medoids Partitioning algorithm that represents a cluster using the most centrally located measurement Instead of updating all centers during an iteration, typically updates only a single medoid How to determine the new medoid? How to evaluate effectiveness of clustering? Covered in more detail during Lecture VIII 2.2.2012 22

Density-Based Algorithms Class of algorithms that represent clusters as dense regions of objects In contrast to partitioning algorithms, can derive clusters of arbitrary shape Areas with low-density of objects are considered noise Basic concepts Epsilon neighborhood: collection of points that are within distance Eps from a point Dense neighborhood: Epsilon neighborhood that contains at least MinPts points 2.2.2012 23

Density-Based Algorithms Radius-Based Clustering Predecessor to density-based clustering Cluster all points with distance Eps of each other to the same cluster MinPts or some other criterion can be used to prune the resulting clusters 2.2.2012 24

Radius-Based Clustering Example 2.2.2012 25

Density-Based Algorithms DBScan A point that has at least MinPts within its Epsilon neighborhood is called a core object Object can only belong to a cluster if it is within the Epsilon neighborhood of at least one core object Core object o within Epsilon neighborhood of another core object p must belong to the same cluster as p Non-core object belonging to the Epsilon neighborhood of some core objects must belong to the same cluster as one of these core objects Non-core objects which do not belong to the Epsilon neighborhood of any core objects are noise 2.2.2012 26

Density-Based Algorithms DBScan Non-core object Core object Outlier / noise Core object Clusters A,B and C can be merged since they share a core object 2.2.2012 27

Density-Based Algorithms DBScan Algorithm that recursively merges Epsilon neighborhoods together to identify dense regions Let c be a core object, within the Epsilon neighborhood of c considered as seed points Cluster expanded with (previously unallocated) points that are within the Epsilon neighborhood of a seed point 2.2.2012 28

DBScan example Noise Clusters 2.2.2012 29

Density-Based Algorithms DJCluster Variant of DBScan where cluster expansion performed iteratively instead of recursively Better suited for large datasets Basic idea: Find Epsilon neighborhood of a point Assign all points within the neighborhood into cluster Check if cluster shares a core point with any of the previous clusters If so, clusters can be merged 2.2.2012 30

Notion of Place Location systems tend to provide information in coordinate form (absolute or relative) People refer to locations using semantic (or symbolic) descriptions Descriptions for the same place can vary between different people Place Representation of location that is consistent with the way people communicate location information 2.2.2012 31

Notion of Place Monastery Petra, Jordan Church Royal Tombs Hotel Treasury Ticket Office 2.2.2012 32

Notion of Place Definitions for place originate from the field of humanistic geography Roots in phenomenology and philosophy Especially philosophy of Martin Heidegger Places entities that relate physical locations with human experiences and meanings Relph: places physical locations that are linked with meanings and activities Tuan: places are spaces (i.e., physical locations) that are embodied with meanings 2.2.2012 33

Notion of Place The meanings attributed to places vary: Activities: swimming hall, movie theater, gym Social: friend s home, regular place to meet friends Generic: library, grocery store, train station Multiple meanings can be attributed to a place Relate to different activities (and times) at the place Places can be perceived as public or private Note: space can be public even if place is private! Depends on the activity, time of day etc. Influences preferences regarding location disclosure 2.2.2012 34

Why place matters? Personalized information delivery E.g., associate notes/to-do lists with places Select advertisements or other information to provide E.g., provide train or bus schedules Depends on stability of information and familiarity of place Awareness cue Places often a cue of activity and availability Automated status messages, e.g., in phone contact list Support user studies Differentiating meaningful situations in analysis phase 2.2.2012 35

Detecting places Locations correlate strongly with activities What are you doing? often answered with location during mobile phone calls People assign activity-related labels to places Places correlate with time Humans spend the majority of time in a few places Probability of labeling a place increases with time But traffic stops (traffic jams, traffic lights) seldom labeled Places can be detected from location traces Activity information can help (if available) 2.2.2012 36

Place Identification Place Identification = the process of detecting places from data A data analysis step with four steps Preparation: clean data, transform data Preprocessing: making data ready for analysis Analysis: performing the actual analysis Post-processing: refining the results Additionally a labeling step Assign semantics with the detected places Can take place before or after analysis 2.2.2012 37

Labeling Common choice is to prompt the user to label a place after it has been detected Alternative to label first and learn the places automatically based on the labels Some labels can be assigned automatically Geographic databases can be used to mine information about the type of building Time information can be used to identify home and workplace Different modalities: text, photo, photo + text 2.2.2012 38

Detecting Places Overview Most place detection algorithms operate on coordinate data Pruning: remove measurements that are unlikely to be meaningful Clustering: apply spatial clustering on the data Post-processing: determine which clusters are likely to correspond to meaningful places Spatial criteria: matching against Geo-databases, considering size of clusters etc. Temporal criteria: requiring a minimum stay duration 2.2.2012 39

Detecting Places Velocity Pruning Measurements where the user is moving are unlikely to correspond to significant places Velocity can be used to prune measurements and clustering applied on remaining data 2.2.2012 40

Place Detection Further Topics Coordinate algorithms unable to separate between different places within the same indoor space Radio fingerprinting based place detection uses stability of signal environment to detect places Current state-of-the-art in mobile phone based place detection Performance decreases in areas with limited signal environment Hybrid algorithms Combine coordinate-based techniques with radio fingerprinting based place detection 2.2.2012 41

Summary Spatial analysis refers to the process of inspecting geographical data Preprocessing: cleaning and preparing data for analysis Analysis: exploratory or confirmatory Post-processing: validating, pruning results Spatial clustering Grouping of similar (spatial) objects together Partitioning algorithms: divide data optimally to clusters Density-based algorithms: identify dense spatial regions 2.2.2012 42

Summary Place Representation of location that is consistent with the way people communicate location information Semantic / symbolic Place detection Process of identifying places from location measurements On coordinate data, can be solved using spatial clustering and temporal + spatial pruning 2.2.2012 43

Literature Ester, M.; Kriegel, H.-P.; Sander, J. & Xu, X., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), AAAI, 1996, 226-231 Sander, J.; Ester, M.; Kriegel, H.-P. & Xu, X., Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications, Data Mining and Knowledge Discovery, 1998, 2, 169-194 Zhou, C.; Frankowski, D.; Ludford, P.; Shekhar, S. & Terveen, L., Discovering Personally Meaningful Places: An Interactive Clustering Approach, ACM Transactions on Information Systems, 2007, 25, 12 Ashbrook, D. & Starner, T., Learning significant locations and predicting user movement with GPS, Proceedings of the 6th International Symposium on Wearable Computers (ISWC), IEEE, 2002, 101-108 Kang, J.; Welbourne, W.; Stewart, B. & Borriello, G., Extracting places from traces of locations, Proceedings of the 2nd ACM international workshop on Wireless mobile applications and services on WLAN hotspots (WMASH), ACM Press, 2004, 110-118 2.2.2012 44

Literature Liao, L.; Fox, D. & Kautz, H., Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields, International Journal of Robotics Research, 2007, 26, 119-134 Marmasse, N. & Schmandt, C., A user-centered location model, Personal and Ubiquitous Computing, 2002, 6, 318-321 Nurmi, P. & Bhattacharya, S., Identifying Meaningful Places: The Nonparametric Way, Proceedings of the 6th International Conference on Pervasive Computing (Pervasive), Springer, 2008, 5013, 111-127 Tuan, Y.-F., Space and Place: The Perspective of Experience, University of Minnesota Press, 2001 Relph, E., Place and Placelessness, Pion Books, 1976 Han, J.; Kambar, M. & Tung, A. K. H., Spatial Clustering Methods in Data Mining: A Survey, Geographic Data Mining and Knowledge Discovery, Taylor & Francis, 2001 2.2.2012 45