Big data and NoSQL databases



Samankaltaiset tiedostot
Information on preparing Presentation

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Capacity Utilization

Other approaches to restrict multipliers

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

Choose Finland-Helsinki Valitse Finland-Helsinki

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Efficiency change over time

Results on the new polydrug use questions in the Finnish TDI data

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL

The CCR Model and Production Correspondence

7. Product-line architectures

Network to Get Work. Tehtäviä opiskelijoille Assignments for students.

Voice Over LTE (VoLTE) By Miikka Poikselkä;Harri Holma;Jukka Hongisto

Seminar on big data management

TIETEEN PÄIVÄT OULUSSA

Alternative DEA Models

MEETING PEOPLE COMMUNICATIVE QUESTIONS

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) ( (Finnish Edition)

Hankkeen toiminnot työsuunnitelman laatiminen

CIO muutosjohtajana yli organisaatiorajojen

AYYE 9/ HOUSING POLICY

TU-C2030 Operations Management Project. Introduction lecture November 2nd, 2016 Lotta Lundell, Rinna Toikka, Timo Seppälä

MUSEOT KULTTUURIPALVELUINA

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007

JA CHALLENGE Anna-Mari Sopenlehto Central Administration The City Development Group Business Developement and Competence

Innovative and responsible public procurement Urban Agenda kumppanuusryhmä. public-procurement

WAMS 2010,Ylivieska Monitoring service of energy efficiency in housing Jan Nyman,

EVALUATION FOR THE ERASMUS+-PROJECT, STUDENTSE

Curriculum. Gym card

Power BI Tech Conference Power BI. #TechConfFI. Johdanto

FinFamily Installation and importing data ( ) FinFamily Asennus / Installation

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

Skene. Games Refueled. Muokkaa perustyyl. for Health, Kuopio

Uusia kokeellisia töitä opiskelijoiden tutkimustaitojen kehittämiseen

Olet vastuussa osaamisestasi

7.4 Variability management

Salasanan vaihto uuteen / How to change password

Capacity utilization

anna minun kertoa let me tell you

Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1.

Security server v6 installation requirements

Infrastruktuurin asemoituminen kansalliseen ja kansainväliseen kenttään Outi Ala-Honkola Tiedeasiantuntija

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo

Use of spatial data in the new production environment and in a data warehouse

Data Quality Master Data Management

Security server v6 installation requirements

RANTALA SARI: Sairaanhoitajan eettisten ohjeiden tunnettavuus ja niiden käyttö hoitotyön tukena sisätautien vuodeosastolla

VUOSI 2015 / YEAR 2015

BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT

toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous

koiran omistajille ja kasvattajille 2013 for dog owners and breeders in 2013

Paikkatiedon semanttinen mallinnus, integrointi ja julkaiseminen Case Suomalainen ajallinen paikkaontologia SAPO

Enterprise Architecture TJTSE Yrityksen kokonaisarkkitehtuuri

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

ProAgria. Opportunities For Success

Tietorakenteet ja algoritmit

1. Liikkuvat määreet

Basic Flute Technique

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana

DIGITAL MARKETING LANDSCAPE. Maatalous-metsätieteellinen tiedekunta

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies

HITSAUKSEN TUOTTAVUUSRATKAISUT

Mat Seminar on Optimization. Data Envelopment Analysis. Economies of Scope S ysteemianalyysin. Laboratorio. Teknillinen korkeakoulu

A new model of regional development work in habilitation of children - Good habilitation in functional networks

make and make and make ThinkMath 2017

Perinteisesti käytettävät tiedon (datan) tyypit

Metal 3D. manufacturing. Kimmo K. Mäkelä Post doctoral researcher

Miehittämätön meriliikenne

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4)

16. Allocation Models

Gap-filling methods for CH 4 data

Statistical design. Tuomas Selander

Miksi Suomi on Suomi (Finnish Edition)

Sisällysluettelo Table of contents

812336A C++ -kielen perusteet,

TM ETRS-TM35FIN-ETRS89 WTG

CASE POSTI: KEHITYKSEN KÄRJESSÄ TALOUDEN SUUNNITTELUSSA KETTERÄSTI PALA KERRALLAAN

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

Algorithms and Systems on big data management

Informaatioteknologia vaikuttaa ihmisten käyttäytymiseen ja asenteisiin

Genome 373: Genomic Informatics. Professors Elhanan Borenstein and Jay Shendure

The role of 3dr sector in rural -community based- tourism - potentials, challenges

Osallistujaraportit Erasmus+ ammatillinen koulutus

ALOITUSKESKUSTELU / FIRST CONVERSATION

LANSEERAUS LÄHESTYY AIKATAULU OMINAISUUDET. Sähköinen jäsenkortti. Yksinkertainen tapa lähettää viestejä jäsenille

Collaborative & Co-Creative Design in the Semogen -projects

SELL Student Games kansainvälinen opiskelijaurheilutapahtuma

Arkkitehtuuritietoisku. eli mitä aina olet halunnut tietää arkkitehtuureista, muttet ole uskaltanut kysyä

Bounds on non-surjective cellular automata

Competitiveness with user and customer experience

Office 2013 ja SQL Server 2012 SP1 uudet BI toiminnallisuudet Marko Somppi/Invenco Oy

ESITTELY. Valitse oppilas jonka haluaisit esitellä luokallesi ja täytä alla oleva kysely. Age Grade Getting to school. School day.

Viestintään tarvitaan tiedon jakamista tietotyöläisten kesken Ville Hurnonen

Information on Finnish Language Courses Spring Semester 2017 Jenni Laine

Tarua vai totta: sähkön vähittäismarkkina ei toimi? Satu Viljainen Professori, sähkömarkkinat

Windows Phone. Module Descriptions. Opiframe Oy puh Espoo

Get Instant Access to ebook Kasvuyritys PDF at Our Huge Library KASVUYRITYS PDF. ==> Download: KASVUYRITYS PDF

CHEM Masters Kirsi Heino Information specialist Learning center beta

Transkriptio:

Big data and NoSQL databases Seminar on big data management Lecturer: Spring 2016 25.1.2016 1

Information on preparing Presentation and Report Goals for presentation and report are different: 1. Presentation: Let the audience to understand your topic; 2. Report: Show your own critical thinking and new ideas.

Contents of Presentation (Length: 35-40 minutes) 1. Introduction: please make a clear introduction 1.1 Why you are interested in this topic: what kind of problems do you hope to solve? 1.2 How had the problem been studied before? 1.3 What is the application of this problem for big data? 2. Related works: 2.1 Make sure you leave sufficient time to present all related prior work. Do not assume that the audience knows the prior work, 2.2 Present it on an intuitive level. 25.1.2016 3

Contents of Presentation (Cont.) 3 Main algorithms and contributions 3.1 Show the main solutions of the paper(s). 3.2 Present it with examples. The examples are quite important for understanding. 4. Your own comments and conclusion 4.1 Present your own comments about the paper(s) 4.2 It would be very good to identify the weak points of the paper(s) after your critical thinking. 25.1.2016 4

Contents of Report (6-8 pages, Single column) 1. What are the research problems? 2. What are the strengths of the paper(s)? 3. What are the main weaknesses of the paper(s)? 4. If you were to solve this problem, what would you do? 5. Why do you like/dislike the paper(s)? 6. Conclusion and summary of your report. 25.1.2016 5

Opponent Carefully listen to the presentation Ask questions after the presentation Complete an opponent assessment form and submit it to the teacher after the presentation 25.1.2016 6

Big data and NoSQL databases 25.1.2016 7

Data storage and history Before-1950s Data was stored as paper records Lot of time was wasted. e.g. when searching. Therefore inefficient.

Magnetic tapes and hard disk 1950s and early 1960s: Data processing using magnetic tapes for storage Late 1960s and 1970s: Hard disks allow direct access to data Data stored in files 25.1.2016 9

Drawbacks of file system Each program has its own data format Programs are written in different languages, and so cannot easily access each other s files. Any new requirement needs a new program 25.1.2016 10

Database Approach 1960 s Network databases 1970 s Relational databases 1990 s Object-oriented and object-relational 1995+ XML, Mobile, GeoDB, Embedded DB 2005+ NoSQL DB, NewSQL DB 25.1.2016 11

History of databases: Turing awards 1973 Charles W. Bachman 1981 Edgar F. Codd 1998 Jim Gray 2014 Michael Stonebraker 12

History of databases: Turing awards Distributed databases and transaction Object-relational model, column stores, Modern databases Network databases Relational databases 2014 Michael Stonebraker 1998 Jim Gray 1973 Charles W. Bachman 1981 Edgar F. Codd 13

Network Model Physical file pointers are used to model the relations between files Most suitable for large databases with well-defined queries and well-defined applications 25.1.2016 14

Relational model E. F Codd introduced the relational model in 1970 25.1.2016 15

Relational model Support relational algebra and operations Data and program are separated Improved data sharing and better integration DB2, Oracle and SQL server are the most prominent commercial DBMS products 25.1.2016 16

Object oriented data model (1990 s) The purpose of OODBMS is to store object-oriented programming objects in a database without having to transform them into relational format 25.1.2016 17

Object-relational model Extend the relational data model by including object orientation Allow attributes of tuples to have complex types, including non-atomic values such as nested relations 25.1.2016 18

Big Data Challenge

5V s of big data Volume TB PB EB Variety Text, audio, video Velocity Real time Operational / Analytic Applications Value Extract Value from big data, complex Analytics Veracity Biases, noise and abnormality in data.

Limitation for relational databases(1) Different Types of Data: Data Variety

Limitation for relational databases(2) What are Big Analytics Not only simple group by aggregation,but also Machine leaning, artificial intelligence Data mining natural language processing Social network analysis and search

Aster Data works on Graph What are Big Analytics

Limitation for relational databases(3) Design for relational data, but not suitable for Graph data,geo-spatial data,unstructured data Limited Scalability No RDBMS has been deployed onto a cluster of more than 1000 nodes Separation of Data Storage and Data Analytics Data migration Difficulty for parallel

Limitation for relational databases(4) Extending relational database Relational table sharding Depending on the program Data size increase, need resharding De-normalization for relational table to improve the performance Increase more redundancy data Increase the cost to maintain data consistence Relational databases cannot solve those challenges. We need new types of databases

NoSQL databases 25.1.2016 26

NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, opensource and horizontally scalable Non-SQL or Not only SQL Watch a video about NoSQL from Jens Dittrich: Say No! No! and No! CIDR 2013 25.1.2016 27

Types and examples of NoSQL databases Types Column Document Key-value: Graph Multi-model Examples Accumulo, Cassandra, Druid, HBase, Vertica HyperDex, Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB Aerospike, CouchDB, Dynamo, FairCom c-treeace, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL databases Allegro, InfiniteGraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog Alchemy Database, ArangoDB, CortexDB, FoundationDB, MarkLogic, OrientDB 25.1.2016 28

Column stores A column-oriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data. This column-oriented DBMS has advantages for data warehouses, clinical data analysis, customer relationship management (CRM) systems, and library card catalogs, and other ad hoc inquiry systems 25.1.2016 29

Example of column stores RowId EmpId Name Age 1 123 Anna 34 2 456 Mikko 30 3 789 Emilia 44 Row-oriented storage: 1:123,Anna,34; 2:456,Mikko,30;3:789,Emilia,44 Column-oriented storage: 123:1,456:2,789:3; Anna:1, Mikko:2,Emilia:3;34:1,30:2,44:3 25.1.2016 30

Key-value stores Key-value (KV) stores use the associative array as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. Henkilön nimi / Esityksen nimi 25.1.2016 31

Example of Key-value stores RowId EmpId Name Age 1 123 Anna 34 2 456 Mikko 30 3 789 Emilia 44 1: (123,Anna,34); 2: (2,456,Mikko,30); 3: (789,Emilia,44) 25.1.2016 32

Insertion of a column and a record in Key-value stores RowId EmpId Name Age Salary 1 123 Anna 34 2 456 Mikko 30 3 789 Emilia 44 4 147 Joha 28 3000 1: (123,Anna,34); 2: (2,456,Mikko,30); 3: (789,Emilia,44); 4: (147,Joha,28,3000) 25.1.2016 33

Document store The central concept of a document store is the notion of a "document". Encodings in use include XML, YAML, and JSON as well as binary forms like BSON. Documents are addressed in the database via a unique key that represents that document. 25.1.2016 34

Example of document store University of Helsinki Yliopistonkatu 4, 00100 Helsinki Finland XML: <contact> <company> Universtiy of Helsinki </company> <address> Yliopistonkatu 4 </address > <city>helsinki</city> <zip> 00100 </zip> <country>finland</country> </contact> JSON: contact": { company": "Universtiy of Helsinki", " address ": " Yliopistonkatu 4 ", city": " Helsinki ", zip": "00100, country : Finland }, 25.1.2016 35

Graph stores Designed for graph data Applications: social relations, public transport links, road maps or network topologies, etc. 25.1.2016 36

Multi-model stores Support multiple data models against a single, integrated backend: Document, graph, relational, and key-value models are examples of data models Database Key-value SQL Document Graph Object Transacti on OrientDB Yes Yes Yes Yes Yes Full ACID, even distributed CouchDB Yes Yes Yes No Yes Marklogic Yes Yes Yes Yes No Full ACID 25.1.2016 37

Summary Relational databases is very successful to manage table and relational data, but it has limitations for managing big data. NOSQL databases is a general term, which includes five types of data stores. NOSQL database are starting to gain market traction 25.1.2016 38