Moving Object Databases Petteri Nurmi (with contributions by Sourav Bhattacharya) 29.11.2016 1
Questions What is a spatial database and what is a moving object database? What are trajectories? How are they represented? What is trajectory simplification and how can we perform it? What is trajectory segmentation and how can we segment trajectories? What is the role of trajectory analysis in Locationawareness? 29.11.2016 2
Spatial Databases Extension of relational databases to represent and query geometries Data management layer for GIS Other application areas include VLSI design, molecular biology, and 3D modeling Main abstractions: Points: individual location measurements Line: curve in space (LineString in KML) corresponds to routes (or pathways) for moving objects roads, rivers, telephone cables and other objects can be stored and represented as lines Regions: enclosed areas of the space (Polygon in KML) can have holes and consist of disjoint pieces consider, e.g., representation of USA (Alaska and Hawaii) 29.11.2016 3
Spatial Databases: Abstractions Spatial abstractions known as geometries International standard for geometries defined by Open Geospatial Consortium (OpenGIS) and ISO: http://www.opengeospatial.org/standards/sfa Main abstractions: Point and Line as before Curve: collection of points that are connected LineString: special case of a curve LinearRing: Curve/LineString that is closed Polygon: collection of LinearRings Surface: collection of polygons Differences between spatial databases, some support all geometries, some none, and some only a small subset KML reference language also follows the same standard 29.11.2016 4
Spatial Databases: Partitions A partition is a subdivision of overall geographical area into areas that are required to be disjoint States of a country GSM cells of a mobile network Voronoi Tessellation (or diagram) Partitioning of a plane into convex polygons 1. Each polygon contains exactly one seed point 2. Every point in each polygon is closer to its seed point than any other Algorithmic method for determining a partition Lloyd s algorithm (generalized k-means) Fortune s algorithm Bowyer-Watson algorithm details out of scope for the course 29.11.2016 5
Spatial Databases: Networks A network is a graph embedded on the plane Nodes correspond to point objects (individual locations) Edges represented using line objects Most common type of graph in spatial analysis is the road network Open Streetmap (OSM) provides XML files of road network around the world Nodes/vertices correspond to intersections Edges represented as line segments (collection of points) Discussed in more detail later during the course 29.11.2016 6
Spatial Algebra and Queries Spatial databases define numerical operations on spatial abstractions Operations on individual objects: length, size of area, contour, length of contour, Operations on sets of objects Collection of spatial objects and operations supported on them referred to as spatial algebra OpenGIS standard defines several operations Set operations: intersect, union, set difference Spatial relationships: contains, within, crosses, touches, overlaps Equality and inequality 29.11.2016 7
Examples of Spatial Queries A B B Region queries: Return all restaurants within 5km from current location? SQL pseudocode: SELECT r.name, r.point FROM restaurants WHERE distance(r.point, current_location) < 5000 Similarly can query for top-k restaurants by distance Spatial object queries: Return intersection of two geometries SELECT intersect(a.shape, B.shape) Determine if area A is contained within area B SELECT contains(a.shape, B.shape) 29.11.2016 8
Spatial Indexing Source: https://en.wikipedia.org/wiki/ K-d_tree Execution of spatial queries requires indexes that take advantage of spatial relationships Most common type of index is a grid Simple to implement, but effectiveness depends on the number of points in a cell Another common scheme is the k-d tree Binary tree where each node is a k-dimensional point Each level of tree splits data according to one dimension, alternates between dimensions Quad-tree a variant where each level splits on two dimensions 29.11.2016 9
Spatial Indexing: R-tree Spatial indexing scheme that stores objects as a balanced search tree of rectangles Groups of objects are represented through their minimum bounding rectangle Internal nodes split subgroups hierarchically into regions with constrained geographical area R-tree and its variants most widely used spatial indexing scheme Very efficient for distance queries and nearest neighbor queries è can be used to optimize performance of many location-aware algorithms Variants: R+ tree, R* tree R+: avoids overlap between regions for faster query performance but higher storage cost R*: minimizes coverage and overlap using a different split heuristic in R-trees insert operation 29.11.2016 10
Spatial Indexing: Space Filling Curves The performance of R-tree and related indexing schemes depends on the quality of rectangular splits Space filling curves Curves that visit each point in a k-dimensional grid exactly once without crossing itself Also known as fractals Hilbert R-tree The path of a space filling defines a linear ordering of points (e.g., see H 1 and H 2 on the left) The Hilbert value of a rectangle is the order (in a Hilbert curve) of the central point Hilbert R-tree uses Hilbert values to organize the structure of the index 29.11.2016 11
R-trees, example Which restaurants are within 5 km from me? If the region we are looking for is within A, we only need to look through {a,b,c,d} (and their child nodes). A b a e f B C A B C D c d g abcd efgh h i D 29.11.2016 12
Spatio-Temporal Databases: Moving Object Databases y time Manages trajectories of mobile objects, e.g., people, vehicles and containers Spatio-temporal database x Moving objects: points stored at discrete instances of time together with motion vectors Velocity and heading Moving regions: areas that can shrink or grow over time E.g., movements of glaciers, weather, oil spills and so on Require effective temporal indexing 29.11.2016 13
Trajectory Refers to the path that an object is moving Defined as ordered time-stamped sequence of (changing) locations: = (t 1, x 1, y 1 ), (t 2, x 2, y 2 ),. Naturally also tuples with more elements can be considered E.g., each sample can contain altitude, orientation, velocity The route of a trajectory is its projection to X-Y plane Spatial representation of the trajectory irrespective of the temporal component In analysis, often aligned with route network using map matching techniques (discussed during later lecture) 29.11.2016 14
Trajectory Representation At current time instant the actual position is: Actual trajectory (function) is defined up to Measurements only obtained at discrete intervals and interpolation required to fill the dots Sampling typically done periodically (with a time period of ) s 1 s 2 s 3 s 4 s s 6 s 7 s 8 5 29.11.2016 15
Trajectory Representation (Cnt.) Sensed trajectory most recent sensed measurement s 1 s 2 s 3 s 4 s 5 s 6 s 7 s8 = s R 29.11.2016 16
Trajectory Representation (Cnt.) Sensed trajectory Continuous, piecewise linear function s 1 s 2 s 3 s 4 s 5 s 6 s 7 s8 = s R Defined for the time interval 29.11.2016 17
Trajectory Representation (Cnt.) Spatio-temporal line segment Between two successive measurements i.e., linear interpolation 29.11.2016 18
Moving Object Databases Suppose we are tracking all taxis in a city Each taxi s position has to be updated frequently Error in location information is kept low Update load can be very high Infrequent updates result in high position error To overcome this problem, MODs store motion vectors (i.e., speed and direction) and periodic location updates Position can be estimated from these using dead reckoning 29.11.2016 19
Moving Object Databases Storing all sensed location on MOD requires huge space Single object produces over 31 million GPS measurements per year (@1 Hz) We might be tracking millions of objects Searching task becomes time consuming To reduce the requirement of space, often trajectory simplification is performed Minimizing the number of vertices Simplified trajectory does not deviate more than an error bound from the actual trajectory 29.11.2016 20
Line Simplification: Intuition Sensed Trajectory Location measurements Simplified Trajectory Measurements retained Simplified Trajectory 29.11.2016 21
Trajectory Simplification: Error e 4 e 5 e 6 e 1 e 2 e 3 Error (e i ) = perpendicular distance from a point to the simplified line segment. Line simplification strategy: the maximum error due to simplification should be bounded i.e., max(e 1,e 2,,e n ) Error Threshold 29.11.2016 22
Trajectory Error Bound Trajectory error bound is the maximum permissible error for a line simplification algorithm often application specific E.g., LBS might require to know the trajectory of an object with an accuracy of 500 meters (i.e., Trajectory error bound = 500 meters) 29.11.2016 23
Role of Trajectory Error Bound Energy efficient tracking By duty sampling of GPS Reduction in data transmission costs Less data transmission over wireless link Further reduction of power consumption Reduction in space requirements for MODs Improves search time 29.11.2016 24
Trajectory Simplification Significant amount of research can be found related to trajectory simplification, e.g., in Computer graphics or vision Computational geometry Trajectory mining Optimal line/trajectory simplification algorithm Outputs minimum no. of points obeying error threshold We will focus on three non-optimal algorithms Turn-based segmentation Douglas-Peucker algorithm MDL-based algorithm 29.11.2016 25
Turn-Based Segmentation One of the simplest approach, segments trajectories based on significant changes in direction of motion Measure changes in azimuth Segment the trajectory when change exceeds a predefined threshold Often measured cumulatively to account for small-scale fluctuations and sensor errors Analogous to the turn-detection for improving energy-efficiency of tracking (Lecture II) 29.11.2016 26
Turn-Based Segmentation - Example 29.11.2016 27
Douglas-Peucker Algorithm Probably the most well known heuristic-algorithm for line simplification Off-line algorithm The algorithm requires all data points before simplification can be computed (batch process) In addition to all data points, the algorithm requires an error threshold Non-optimal There exists algorithms that output results with fewer number of data points (e.g., GRTS Sec by Lange et al.) 29.11.2016 28
Douglas-Peucker: Overview Follows a divide-and-conquer paradigm Begins by generating a line segment i.e., a line segment between first and last points e Finds a point,,furthest from the line segment 29.11.2016 29
Douglas-Peucker If distance to the furthest point is within the error bound, then return points s 1,s n as the simplification. Otherwise, add point s i to the simplification list and recursively call Douglas-Peucker on {s 1,,s i } and {s i,, s n }. e e e e e e 29.11.2016 30
Douglas-Peucker Performance evaluation Reduction rate Ratio of the number of sensed points to the number of points left after simplification, in our example 30/5 = 6 Time complexity Worst case running time O(n 2 ) 29.11.2016 31
Trajectory Simplification Algorithms Douglas-Peucker is an example of a top-down approach Whole set of data partitioned recursively Different objective functions can be used: Speed changes: keep point with highest speed variation Time-ratio: use distance to interpolated point instead of orthogonal projection Bottom-up approaches neighboring data points merged until given criteria met Windows-based approaches window moved along data and points within window compressed 29.11.2016 32
MDL-based Trajectory Simplification Minimum Description Length (MDL) was introduced by Jorma Rissanen MDL is an information theoretic approach According to MDL: the best hypothesis for a given dataset is the one that leads to the best compression of the data The MDL principle can be applied to solve the trajectory simplification problem An optimal partition of a trajectory should posses two desirable properties: preciseness and conciseness 29.11.2016 33
MDL-based Trajectory Simplification Preciseness The difference between the original trajectory and its simplification should be as small as possible Conciseness The number of trajectory partitions should be as small as possible Preciseness and Conciseness are contradictory to each other We need a trade-off between above two properties 29.11.2016 34
MDL-based Trajectory Simplification 29.11.2016 35
MDL-based Trajectory Simplification MDL principle is used to select a particular partitioning of the trajectory and we refer to the partitioning as the hypothesis We evaluate each hypothesis with the MDL cost MDL cost function is composed of L(H) length in bits required to describe the hypothesis H L(D H) length in bits required to describe the data D when encoded with the help of the hypothesis H 29.11.2016 36
MDL-based Trajectory Simplification s 2 s 3 s 5 s 4 s 1 Hypothesis (H): simplified representation is s 1 s 4 MDL cost can be calculated as: L(H) + L(D H) 29.11.2016 37
MDL-based Trajectory Simplification s i s j s i s j 29.11.2016 38
MDL-based Trajectory Simplification True s i+1 s i+2 False s i+3 False s i s i+4 = s j Follows a greedy approach, running time O(n) 29.11.2016 39
Comparison of Trajectory Simplification Algorithms MDL-based approach is agnostic of any error threshold, whereas, Douglas-Peuker adapts to an input error bound Both algorithms are designed for offline line simplifications Douglas-Peucker follows a divide-and-conquer paradigm, whereas, MDL-based algorithm follows a greedy approach Both of the algorithms are non-optimal 29.11.2016 40
Example: EnTracked T System Apparatus Biking Running Nokia N97 Driving u-blox LEA-5H GPS Receiver Walking 29.11.2016 41
Example: Gesture Recognition Hand gesture can be decomposed into strokes using line segmentation techniques 29.11.2016 42
Example: Navigation Navigation systems can use line / trajectory segmentation to determine when to play instructions Intuition: 1. Find optimal path 2. Segment the path 3. Associate each change of segment with a new instruction 29.11.2016 43
Summary Spatial databases extension of relational databases to spatial data Define geometries which are abstractions of geographic objects Support spatial queries and operations (e.g., size of area or length of road segment) Spatial indexing required to make queries effective Grid indexing Kd trees R-trees and variants R+, R*, Hilbert R 29.11.2016 44
Summary Object s trajectory sensed using a position sensor is represented by a spatio-temporal piecewise linear function To improve performance of MOD, often trajectory simplification is carried out Improves search time Minimizes space requirement Saves energy and cost of mobile devises Trajectory simplification technqiues: Douglas-Peucker: divide-and-conquer / top MDL: greedy optimization to balance preciseness and conciseness 29.11.2016 45
Literature Lange, R.; Farrell, T.; Durr, F. & Rothermel, K. Remote real-time trajectory simplification IEEE International Conference on Pervasive Computing and Communications, 2009. PerCom 2009., 2009, 1-10 Kjærgaard, M. B.; Bhattacharya, S.; Blunck, H. & Nurmi, P. Energy-efficient Trajectory Tracking for Mobile Devices Proceeding of 9th International Conference on Mobile Systems, Applications and Services (MobiSys), 2011, 307-320 Lee, J., Han, J., and Whang, K., "Trajectory Clustering: A Partition-and-Group Framework," In Proc. 2007 ACM SIGMOD Int'l Conf. on Management of Data, Beijing, China, pp. 593 ~ 604, June 2007 29.11.2016 46