EMO Eetu Mäkelä
EMO SLDREAI Purpose: scalable linked data repository with extensible advanced indexing Originally created for view-based & text search purposes in the massively heterogeneous environment of CultureSampo Scalable: should scale to billions of triples and allow clustering Extensible advanced indexing: should allow for efficient search using e.g. text patterns, transitive inferencing, geo-coordinates, temporal constraints, numeric ranges, etc..
Choice of EMO SLDREAI Architecture While most RDF store data structures base on B-trees originating from ER database indexing, EMO is based on the vector space model from IR (built on Lucene) This is a tradeoff: 1) EMO may lose on simple triple matching 2) Writes to the store may be slower But: 1) Supports easy implementation and efficient integration of specialized indices 2) Scaling and clustering is easy
EMO SLDREAI Specialized Indices Geocoordinate search (both of objects and triples) Temporal search (both of objects and triples) Object baseform search Numeric range search Search based on transitive inference Transitive text search (general and on a particular field) Unified view In massively heterogeneous LD environments, there are usually multiple URIs for a single thing. These are unified using sameas statements In a global search situation, its usually bad to show these as multiple objects, so EMO provides a unified view to the index, where all equivalent URIs are replaced with one of them Special indices are kept up to date as the store is modified
EMO Specialized Indices: Geo-Coordinate Search: The distribution of churches in Southern Finland
EMO Specialized Indices: Geo-Coordinate Search of 17 million objects with coordinates
EMO Specialized Indices: Temporal Entity Search: Changes in beard fashion in the late 19th century
EMO Specialized Indices: Transitive Object Search: Changes in imports from Japan to Finland in the middle 20th century
EMO Specialized Indices: Text Search: What is the position of Lemminkäinen in Finnish culture? (A search for everything related to Lemminkäinen with explanations)
EMO Specialized Indices: Transitive Text Search:
EMO SLDREAI Scalability Tested on a single machine with 2,4 billion triples, 282 million URIs, 266 million literals and 27,5 million blank nodes In theory ready for clustering, but not tested
Using EMO SLDREAI Own API Wrapper for use as a Jena Model / GraphStore Wrapper for use as a Sesame Repository
EMO SPARQL Functionality Uses Jena wrapper & ARQ ARQ allows defining custom functions but also custom property functions These are used to expose the advanced functionality of EMO SLDREAI Text search / baseform search Geo-coordinate search Transitive search URI unification Also some other functions have been added Random sampling Robust label extraction Literal mangling
EMO HTTP RDF Server SPARQL/SPARUL Uses Jena wrapper & Fuseki SPARQL Graph Store protocol Uses EMO RDFIO library picking best bits from both Jena & Sesame URILookup EMO usually cannot control the URIs stored in the repository Yet it would be nice to support the Follow Your Nose principle of Linked Data URILookup is given an URI as a parameter and returns the description of that URI as RDF, but also tags each new URI in the description with an rdfs:seealso -link back to URILookup This way, Follow Your Nose can be bootstrapped with just a single link inside EMO Search Custom search API allowing for both simple queries as well as certain forms of complex queries that would be hard or inefficient to do in pure SPARQL Mapping queries (text search finds the genre of an artist, which is mapped using a SPARQL mapping query through an event to a location) Grouping queries (with optional algorithmic group reduction) Support for view-based search (each view being defined by a SPARQL query) What's made possible by the EMO HTTP RDF Server? DEMO
Mahdollistaa lisätietojen haun sekä näytettävien tietojen valinnan tarpeen mukaan
Mahdollistaa valittavan käsiteavaruuden valinnan tarpeen mukaan (tässä 10km etäisyydellä Helsingin keskustasta olevat paikat)
Mahdollistaa käsitteiden lisäyksen omiin laajennussanastoihin Tab Lisättyä sanastoa voi myöhemmin editoida vaikkapa SAHAssa:
EMO ARPA EMO Maui integration allows an EMO repository to be used as a vocabulary for Maui (with dynamic constraining of the vocabulary by SPARQL) Uses EMO baseform index EMO SPARQL ARPA integration allows querying ARPA for automatic annotations from SPARQL Can be hooked using SPARUL back to the repository EMO VMSAAC
EMO VMSAAC DEMO
EMO VMSAAC Automaattinen asiasanoitin/nimettyjen entiteettien tunnistin ARPA kytketty takaisin EMO-palvelimeen Sekä annotointi että opetus Annotaatiot syötteenä EMO:n SAHA-editorin epävarmojen annotaatioiden arviointitoiminnallisuuteen Mahdollistaa luontevan syklin, jossa käsin tehdyn ydinopetusaineiston muodostamisen jälkeen aina uutta tekstiaineistoa indeksoidessa saadaan automaattiset ehdotukset asiasanoiksi JA nämä ehdotukset paranevat jatkuvasti työn edetessä