Avoin tiede ja muutos ihmistieteiden tutkimuksessa Mikko Tolonen Helsinki Computational History Group University of Helsinki
Concept of open science Open science refers to endeavours to advance open, shared and reproducible models of operation in scientific research Open data refers to substance and data in digital form that is free for everyone to use, modify and share. With respect to significance of open science, see Academy of Finland & OKM ATT Open Science initiative - http://avointiede.fi/keskeinen-sanasto
open access 1. Open (raw) data 2. Open research data 3. Open publications
OPEN DATA PRINCIPLES Not just publishing of end results The research process is just as important. Methods and research data needs to be open as well. transparency, reproduction, collaboration, new initiatives Access to data is an institutional question. Using the accessed data, and tidying it up first, is a question of research. Mere linking, endless clicking is not research.
Digital humanities is about building ecosystems content feedback Humanities researcher Memory organization data method support method evaluation technical feedback CS researcher
Digital humanities research process raw data cleaning up data (80% of work) exploratory tools results researc h articles understanding data 80% of your time for data cleanup, another 80% for algorithms,
Digitaaliset aineistot: Yhteistyö ja avoin tiede välttämättömiä työn kuormittavuuden kannalta
Muutos ihmistieteiden tutkimuskulttuurissa Yksittäiset tutkijat Tutkijaryhmät
2013 2017
Helsinki Computational History Group https://comhis.git hub.io/ Computer scientists researching open workflows, algorithms and interfaces for humanities text and metadata Linguists exploring the relationship between words and concepts Historians interested in conceptual and actual historical processes
How to maintain our identity and build something new? Objective: Even when in between, staying within a research tradition What is happening: From Comhis Collective to Helsinki Computational History Group
Historians CS / Data science Linguists Library Helsinki Computational History Group Collaborators Aleksi, Ginter, Asko, Hannu, Eero, Jouni, Jukka, Osma, Turo, Risto
Helsinki Computational History Group PI Senior Researchers Postdocs Doctoral students Research assistants
Grand aim: understanding public communication covering the early modern Europe WP1: movement of ideas - Metadata work based on several different library catalogues - genres (poetry, pamphleteering); intellectual traditions (natural law tradition, ancient texts - text reuse: genres (historical works, quoting practices) WP3: Research data releases - ESTC - Fennica - Kunglica - CERL - ECCO text reuse (+ EEBO text reuse) - Finnish Newspapers Cases overlap WPs WP2: conceptual change - concepts are crucial, but not directly jumping into this for various reasons - Theoretical underpinning (historians + linguists) - Concepts as linguistic objects (linguists + historians + CS) WP4: easy to use tools for historians - APIs - shiny apps - etc. Methods are there, but not driving the research, interest of knowledge of history is at the centre.
Examples of progress of workflows so far ESTC ESTC Fennica Finnish Newspapers Fennica History in ESTC publication ECCO Kungliga Book printing in Finland Vaivainen + rural in Finnish Analysing the spheres of public in late 18th-century Britain Book printing in Sweden and Finland, 1640-1828 ECCO text reuse Finnish Newspapers Metadata ESTC Finnish Newspapers Text reuse in Hume s History Location, language and form of newspapers in Finland
What really matters? 01 02 03 Communication Negotiation Grasping the relevance of change in the research culture
FINNISH NEWSPAPER COLLECTION 3,2M pages, 60% free web use 6M, 20% 130k, 100%
Transparency (data, methods, reporting) Memory org Research team A Text mining a data source (Finnish newspapers) Reproducibility Repository Access and reuse Research team B Everything is beautiful, right?
CASE OF HELSINGIN SANOMAT (1910-2000) Problem: The interpretation about privacy and copyright law at the Helsingin Sanomat end is quite strict. No exception for research use, which is on one hand understandable if the data mining exception in EU Digital Single Market strategy is not implemented. Solution: Aikakone???!!?? OR: We need much more support for implementation of fair use clauses in research and teaching and a sensible reading of EU strategy. Also, the methods to ensure that the data becomes available for research use. No problem of signing binding agreements between researchers and the data holders.
What is digital in digital humanities?
We were granted a 3 million euro H2020 grant; 900000 to University of Helsinki (together with Hannu Toivonen s group from computer science and National Library of Finland). Other partners from France, Austria and Germany.
State-of-the-art is going somewhere else than text searches, why does Helsingin Sanomat, for example, start developing their own services instead of collaborating with academic research?
Anatomy of a book: text reuse in David Hume's History of England (1778) Chap LIX (Charles I execution)
Finnish survey on Digital Research Practice in the Arts and Humanities (October 2016, 239 participants)
Lessons of Digital Research Practice in the Arts and Humanities Growing interest towards new methods When linguistics are excluded, current situation is such that humanities has very little experience of use of digital methods Most of sources for researchers in the department are NOT in digital format needs to be taken into consideration The idea that we re-train humanities students to develop digital methods through programming on their own are unrealistic.
CHALLENGES WITH OPEN DATA Institutions reluctant to give full access to data. Why? Research process is not opened and research data is not shared in the Humanities. Transparency, reproduction, collaboration, new initiatives are missing. Why? Short answer: Cultural change takes time. We need concrete examples in the core field of the Humanities that actually prove OPEN DATA PRINCIPLES as useful.
Elinkaariajattelu digitalisaatioratkaisuissa Elinkaariajattelu (miksi esim. ATT ei täysin onnistunut), eri alojen erilaiset tarpeet digitalisaatioon liittyen. Ei ole monoliittistä tutkija käsitettä. Tiedon intressin huomioiminen kaikessa olisi erittäin tärkeää. What level of reproducibility are we aiming at? Compatibility, version control, software citations, tool development with respect to our research data? Platform thinking: Mildred tyyppiset hankkeet eli digitaalisen infran kehitys pitäisi integroida vielä selvemmin tutkimuslähtöiseksi (ei ole helppoa; storage, metadata creation, data management, the actual use of the research data, sharing of the data and interoperability)
Ohjeistukset digitaalisten aineistojen tutkimuskäytön kanssa World is full of historical/digitized datasets. Simple and authoritative guidelines needed so the obvious cases are not pushed to legal experts whose time is scarce. Who will make these guidelines on legal aspects? ATT? Heldig collaboration? SSH research centre?
Avoin tiede ja tutkimusdata Tutkimusdatan merkitys tulevaisuudessa tulee olemaan suuri -> se pitää ottaa huomioon kaikessa suunnittelussa ajatellen että se on myös itsessään kilpailuvaltti. Tähän kuuluu mm. palkitsemismenetelmät julkaisujen tapaan.
Digitaalisen infran keskittäminen järkevästi SSH puolella pitäisi digitaalista infraa ja kaikkea siihen liittyvää hallinnointia, ohjeistusta, koulutusta, suunni elua, avointa ede ä yms. keski ää. Tämän takia SSH Research Centre ja sen yhteydessä infrapuoli on hyvä idea. Rahoituksen saaminen myös helpottuu huomattavasti kun on selkeät ratkaisumallit mielessä liittyen datan hallintaan, sen saamiseen, käsittelyyn ja avaamiseen.
Kansainväliset kuviot ongelmat ovat samat kaikkialla Ecosystem thinking! For example, the Nordic questions of datamining newspapers are all aligned and faced with same problems. We are putting a Nordic network together. DARIAH, pan-european infrastructure for arts and humanities scholars working with computational methods. It supports digital research as well as the teaching of digital research methods. Dariah working groups HELDIG????
OPEN HELSINKI DIGITAL HUMANITIES ECOSYSTEM IN PRACTICE - INVOLVE AND MIX PEOPLE OF ALL BACKGROUNDS 1.Individual researchers, research groups and projects 2.Symposiums and other events 3.Research seminars 4.Lecture courses 5.Infrastructure and memory institutions Keep it practical involve researchers and memory institutions in project courses, hackathons, students and memory institutions in research