THE EFFECT OF STATISTICAL MACHINE TRANSLATION SYSTEM ON EXPANDING DATA WITH OUT-OF-DOMAIN CORPORA BY HUMAN EVALUATION

Koko: px
Aloita esitys sivulta:

Download "THE EFFECT OF STATISTICAL MACHINE TRANSLATION SYSTEM ON EXPANDING DATA WITH OUT-OF-DOMAIN CORPORA BY HUMAN EVALUATION"

Transkriptio

1 UNIVERSITY OF EASTERN FINLAND PHILOSOPHICAL FACULTY SCHOOL OF HUMANITIES FOREIGN LANGUAGES AND TRANSLATION STUDIES Translation studies and translation technology Fangting Xu THE EFFECT OF STATISTICAL MACHINE TRANSLATION SYSTEM ON EXPANDING DATA WITH OUT-OF-DOMAIN CORPORA BY HUMAN EVALUATION -- TRANSLATIONS OF SUBTITLES FROM ENGLISH TO FINNISH MA Thesis 2016

2 ITÄ-SUOMEN YLIOPISTO UNIVERSITY OF EASTERN FINLAND Tiedekunta Faculty Philosophical Faculty Osasto School School of Humanities Tekijät Author Xu,Fangting Työn nimi Title The Effect of Statistical Machine Translation System on Expanding Data with Out-of-Domain Corpora by Human Evaluation --Translations of Subtitles from English to Finnish Pääaine Main subject Työn laji Level Päivämäärä Date Translation Pro gradu -tutkielma studies and Sivuainetutkielma translation Kandidaatin tutkielma technology Aineopintojen tutkielma Tiivistelmä Abstract x Sivumäärä Number of pages Finnish abastract+appe ndix(21) This study tries to apply out-of-corpus to improve the performance of statistical machine translation from English to Finnish. Statistical machine is considered to be the most advanced approach in machine translation field. What matters the most for SMT system is parallel corpora, which however can be a problem for languages with a scarcity of data. Finnish is one of these languages. Therefore, this paper aims to find a way to solve the data scarcity of Finnish language by adding out-of-domain corpus. The translation system is trained five times with different settings of corpora. In the first, second and third settings, the training corpus contains 1 million, 2 million and 10 million sentence pairs of subtitles respectively. In the third and fourth settings, the training corpus contains 1 million sentence pairs and out-of-domain corpus, namely corpus of books (about 3400 sentence pairs) and corpus of Europarl (1 million sentence pairs). Five translations are produced in this way from a same source text. BLEU scores are obtained as well. Three linguistic students evaluated the five translations sentence by sentence as impossible, possible and best translations. The evaluation results are analysed with Chi-square analysis from the quantitative perspective. The source sentences are also scored by counting the total number of impossible, possible and best translations in three evaluations respectively in order to find out which sentences are improved or degraded by the different settings of corpora, which paves the way for qualitative analysis. Scored sentences are categorised into groups by being improved, degraded or remained, and then analysed in details in order to find out the potential linguistic difficulty for the translation system. The evaluation results are compared with the BLEU scores. They are mostly consistent with each other, that is the in-domain translation with the biggest amount of corpus is the best translation. The quantitative analysis results show that out-of-domain corpus does not help to improve the translation quality. By qualitative analysis, adding more in-domain data is found to be useful in certain aspects, for example, more attention is paid to syntax. At the same time, several grammatical structures are found to be difficult for statistical machine translation system in translating English-Finnish texts. Avainsanat Keywords Statistical machine translation; out-of-domain corpus; evaluation; scoring

3 Contents 1. Introduction Statistical Machine Translation Machine Translation The Statistical-based Approach Moses Earlier Work on Corpora effectiveness Research Questions Data Data corpus OPUS Open Parallel Corpus OpenSubtitles Corpus of Books Corpus of Europarl Source text Methodology Translate with Moses Corpus Preparation Language Model Training Training Translation System Tuning Testing Evaluation and Scoring Evaluation Scoring A Small Research Results Analysis BLEU Quantitative Qualitative Qualitative Analysis of In-domain Translation Qualitative Analysis of Out-of-domain Translation Analysis of The Small Research Conclusion References.. 61 Finnish Abstract..66 Appendix I..67 Appendix II.69 Appendix III... 71

4 Acknoledgement Over the previous one and a half years whilst writing this thesis, I have had an intense learning experience on both an academic and a personal level. I would like to express my sincere gratitude to the following people who have supported and helped me so much throughout this period. First, my sincerest gratitude goes to my supervisor Jukka Mäkisalo, who has been supportive throughout the whole process. He guided me in every stage of my work, from the original planning phase to the final corrections. Every time when I came across with a problem, he helped me to resolve it with valuable guidance. I would also like to thank Professor Stefan Werner, who helped me with programming problems at the beginning of my experiments. He replied every promptly and anwsered my small questions with the greatest patience. I would like to thank Matt and Xinyang Ni. Matt explained the basic terms in programming to me. Xinyang and I spent nights working on Moses by remote control. The translations coud not have been done by time without his help. In addtion, I would like to thank Emma Takkunen, Aaro Paavali Lappalainen, Tuomas Kuosmanen, who were willing to be participants of this work. The evaluation part could not have been done without their help. I would also like to thank Suvi Kananen, who did the 5- scale evaluation for me. Moreover, my gratitude also goes to Anni, Sanna, and Sini who did Spanish-Finnish translation post-editing for me, though the post-editing is not part of the work anymore. I would also like to thank Maarit Koponen, who gave me valuable advice on post-editing. Last but not least, I want to thank my family for being supportive both psychologically and finacially for the whole time.

5 1 Introduction Machine translation has always had the aim of producing high quality translation without too many human resources being involved. It is doing better than ever with the development of statistical machine translation, for which, the utmost importance is the quality of the training data. Engines like Moses (Koehn et al., 2007) create improved translations when presented with larger and larger parallel corpora, especially with in-domain training data (Tufis et al., 2014). However, collecting parallel data from a particular domain in large enough quantities has been one of the challenges for developing SMT systems for languages, especially for those with less resource. Languages like Finnish and German use morphology to express additional information, such as case endings. Finnish, is nevertheless more special compared to other European languages mainly in the following respects (Karlsson, 2009): first, it has around 15 case endings, which is way more than other European languages, such as German (four case endings), Polish and Czech (both have seven case endings) 1 ; Second, Finnish uses endings where Indo-European languages typically have independent words, as with the use of possessive suffixes, which correspond to possessive pronouns in English, e.g. kirja/ni my book, kirja/si your book, kirja/mme our book and so on; third, Finnish has another set of particular endings called enclitic particles, which comes to the last after all the other endings. Their function can be for emphasising, or interrogation, e.g. sanon/han I do say ; minä/kin me too ; on/ko is it? All those features make Finnish stand out as a tougher language to deal with for translation engines like Moses, as stated by Philip Koehn that translating into morphologically rich

6 languages often require additional information for choosing the correct word form that may not be readily available in the input language. (Koehn, 2010: 43) Moses works well with languages like English, Spanish and French, as most of the early research on Moses is done with those languages because of their wide reaching use. However, for Finnish, concerning its linguistic difficulty as well as its scarce usage out of Finland, there has not been too much work done with Moses yet, and there is still a long way for it to reach a high level translation quality. The effectiveness of in-domain and outof-domain corpus on improving the performance of translation models have been studied with Moses concerning the scarcity of in-domain parrellel corpus. Adding out-of-domain corpus has been proved to be both effective and ineffecitve in certain circumstances (Eg: Burak Aydin and Arzucan Ozgur, 2014; Tufis et al., 2014). Therefore, this paper aims to investigate which method is better for improving the machine translation quality by adding corpus from two perspectives: first, by adding corpus from the same domain (in-domain corpus), that is to increase the quantity of the corpus; second, by incorporating out-of-domain parallel corpus to the base corpus. Five translations translated by Moses from a same source text will be compared, with the language pair being English to Finnish. The first translation is produced with the subtitle-corpus containing 1 million sentence pairs; the second and third translations are produced with the subtitlecorpus containing 2 million and 10 million sentence pairs respectively (covering the previous amount of corpus); the fourth and fifth translations are produced with the joint corpus by adding the out-of-domain corpora, corpus of Books and corpus of Europarl respectively into the subtitle-corpus with 1million sentence pairs. After the translation, the evaluation metric BLEU (Papineni et al. 2002) will be employed to obtain the scores of the five translations. 3 Finnish native speakers who are also Linguistic students will evaluate 2

7 the translations in three degrees: impossible, possible and the best. Based on the evaluation results, Chi-square analysis is conducted from a quantitative perspective in order to support or contradict the results of BLEU. Thus, we investigate the translations in details by scoring the source sentences first and find out the potential grammatical structures in the source texts that are difficult for a machine translation system from a qualitative perspective. In addition, it still remains questionable if an extra out-of-domain corpus does help the machine translation, or worse than being useless, there even might be some risk of lowering the quality of translation. These are the problems I wish to investigate through the experiments. 3

8 2 Statistical Machine Translation 2.1 Machine translation Machine translation began its history in the 1950s and it could be divided into four stages of development: the pioneer years, the first generation, the second generation and the third generation (C.K.Qual, 2006). The time Before 1980s was considered as the pioneer years for machine translation, which consists of a high expectation of machine translation at the beginning and a complete denial of its usability after this. The first generation refers to direct translation which was the first method used in machine translation development. It could be described as a dictionary based system that aligns each source language word to its target language equivalent and involves little linguistic analysis. The second generation refers to rule-based approaches which involve using morphological, syntactic or semantic rules for analysing a source language text and synthesis of a target language text (Carl and Way, 2003b:xviii). The Third Generation made a big improvement which applied corpus into translation studies, hence called corpus-based approaches. Mona Baker was first to put forward the idea of applying corpus linguistics techniques and methods to translation studies (Baker, 1993), though this idea does not relate to machine translation directly by then. Corpus-based approaches gained popularity in early 1990s in the machine translation field, consisting of example-based and statistical-based approaches which use linguistic information in a corpus to create new translations. Instead of using any syntactic or semantic rules, both example-based and statistical-based approaches rely on large electronic corpora of text to form patterns of equivalence, which is clearly quite different from the earlier techniques such as rule-based methods that used predetermined linguistic 4

9 rules to generate new translations. Corpus-based approaches use parallel corpora, which consist of aligned texts, pairs of source and target language texts, where the source and target-language texts are structurally matched usually at sentence level, also known as bitext. A bilingual parallel corpus can be uni-directional or bi-directional. Based on the aligned texts, for statistical approach, statistical calculations are carried out to find the probabilities of various translation equivalents, while for example-based approach, examples are expected from the aligned bilingual texts by matching examples. Corpusbased machine translation systems all use a group of what is referred to as reference translations which contains source language texts and their translations. Source and target language texts here are aligned. Following on from this, the equivalent translation is extracted using a specific statistical method as well as by matching a number of examples extracted from the corpus (Carl, 2000: 997). Both USA (IBM) and Japan were working on these two approaches at that time, with the former concentrating on statistical-based method and the latter experimenting example-based method. 2.2 The statistical-based approach Statistical machine translation started its career in late 1980s with a project called Candide, which was based on a corpus of more than two million French and English sentences, of transcribed Canadian parliamentary debates known as Hansards (C.K.Qual, 2006), developed by IBM using statistical methods. As early as in 1960, there were already some experiments carried out by IBM on this method, but they were not successful at that time. original approach of IBM is to map individual words to words and allow for deletion and insertion of words. Lately, better translation quality has been shown with the use of phrase 5

10 translation 2. Later on, a newer stochastic technique called Bayes theorem revived the use of statistical methods in machine translation research (C.K.Quah, 2006). In statistical machine translation, the translation system is trained on large quantities of parallel data from which the system learns how to translate small segments. Engines like Moses produce better translations when presented with larger and larger parallel corpora (Tufis et al., 2014). The translation system divides the source language text into segments that are strings of words or phrases. These segments are compared to the parallel corpora which have the original texts and their translations. Then the statistical method of calculating the probability of translation is applied on the aligned bilingual corpus to produce new target language segments. The probability is calculated according to Bayes rule (Brown et al. 1990), which states as: Pr (S T) = {Pr(S), Pr(T S)}/{Pr(T)} 3 In this formula, Pr(S) denotes the language model that generates valid fluent target sentences, which are used to produce the source sentence S with the conditional probability Pr(T S) which is calculated by translation model. Then the decoder does the actual translation, which is to find the target sentence T with highest probability. By the highest probability of the target language translation, the algorithm selects the translation it regards as most likely by combining both information from its translation model and its language model (C.K.Quah, 2006: 78). 2.3 Moses Moses is an open-source project, being developed as a reference implementation of stateof-the-art methods in statistical machine translation (Koehn, 2011: 15). It offers two types The original equation states as: P(A B) = \frac{p(b A)\, P(A)}{P(B)}. 6

11 of translation models: phrase-based and tree-based, which is also called syntax-based. Its decoder is the benchmark for research in this field which is also the main reason why it becomes the dominant statistical machine translation system, apart from its free accessibility for the public. In addition to the SMT decoder, the toolkit also includes tools for training, tuning and applying the system to many translation tasks (Koehn et al., 2007). Moses is a decoder (the translator ) which translates new sentences by finding the highest scoring sentences in the target language in terms of exactness (according to the translation Model) and fluency (according to the Language Model) from a list of candidate translations. Moses allows you to have your own individualised translation system, which works for your own purpose. All you need is a sentence-aligned parallel corpus. 2.4 Earlier work on Corpora effectiveness Generally, the larger the corpora are, the better the SMT works. Moses produces better translations when presented with in-domain training data (Tufis et al., 2014), but collecting parallel data from a given domain in sufficiently large quantities is not an easy task. Therefore, significant research on data selection and domain adaptation has been conducted to alleviate this data scarcity. Cohn and Lapata (2007) presented a method to make effective usage of multi-parallel corpora by triangulation, the process of translating from a source language (English) to a target language (French) via an intermediate third language. Their results show that this novel method obtains more reliable translation estimates from small datasets. Wu et al. (2008) trained a baseline system with out-of-domain corpora and then used indomain translation dictionaries and in-domain monolingual corpora to improve the in- 7

12 domain performance, with the language pair being Chinese to English and English to French. They achieved absolute improvements in BLEU scores compared with the score of using out-of-domain corpora only. Wang et al. (2012) attempted to build a system that works well in multiple-domains simultaneously for 20 language pairs, including Finnish. Their method tried to use models of different domains in a combined system and it automatically detects the domain and its parameters at runtime. Their method achieved improved translation accuracy. The experiments done by Tufis et al. (2014) showed that it is better to use in-genre data to translate same genre texts than to mix the data with out-of-genre parallel texts, though the domain is different. They have experimented in three pairs, English to German, to Romanian and to Spanish. Burak Aydin and Arzucan Ozgur (2014) proposed an approach for expanding the training data by including parallel texts from an out-of-domain corpus with language pair being English to Turkish. Their method is to rank the out of domain sentences using a language modeling approach to select the best out-of-domain sentences and they then include the sentences to the training set by using the vocabulary saturation filter technique. They prerank the out-of-domain parallel corpus based on the sentence perplexities calculated using an in-domain language model. Their system achieved improved BLEU score. Haddow and Koehn (2012) analyses the effect of out-of-domain data on statistical machine translation in two stages. First, they relate the effect of the out-of-domain data on translation performance to measures of corpus similarity, then analyse the effect of adding the out-of- 8

13 domain data at different parts of training pipeline in 2 domains and 8 language pairs. Their results show that the out-of-domain data improves coverage and translation of rare words, but may degrade the translation quality for more common words. Generally, the researchers above show that out-of-domain corpus helps to improve the performance of translation system. Like English-Turkish pair (Burak and Arzucan, 2014), English-Finnish pair is also a low-resource language pair for machine translation. Therefore, it is worth finding a way to alleviate the problem of data scarcity. The results of whether a system is improved or not is mainly measured by evaluation metrics such as BLEU in Wu et al. (2008) and Burak and Arzucan (2014). In this work, BLEU metric is also applied, but it is not the only measure standard. In addition, human evaluation is used to justify the results of BLEU evaluations. Based on the evaluations, detailed grammatical problems can be found out during the analysis of sentences. 9

14 3 Research Questions It is generally known that performance of a statistical machine translation improves when provided with larger and larger amounts of data from the same domain and it declines when added with data from different domains. However, for language like Finnish, a lowresource language, it is necessary to supplement scarce in-domain training data with out out-of-domain data. Thus, this work has been done in two perspectives: Quantitatively, it is aimed to find a way to improve the performance of Moses with Finnish; qualitatively, it is aimed to find out the detailed reasons of why Moses does not work well with Finnish language. Based on this, the following questions are proposed: 1. Will adding in-domain corpus amount improve the translation quality? a. 2m translation is better than 1m, and 10m translation is better than 2m and 1m. b. What linguistic (syntactical or lexical) problems are solved in 2m translation and 10m translation compared to 1m translation? 2. Will adding out-of-domain corpus amount improve the translation quality? a. Parl1m translation and bk1m translation are better than 1m translation. b. Parl1m translation is better than 10m translation. c. What linguistic (syntactical or lexical) problems are solved in bk1m translation and Parl1m translation compared to 1m translation? 3. Are the BLEU scores of out-of-domain translations higher than in-domain translations? a. What are the BLEU scores of the five translations respectively? b. Are the BLEU scores consistent with the evaluation results? 10

15 4. Are there differences between evaluators? a. Do the evaluators have different standards towards evaluations? b. Do the evaluation results violate each other? 5. What are the differences of improvements between increasing the size of corpus and adding out-of-domain corpus? a. Do 10m translation and parl1m translation have the same kind of improvements? b. What improvements do 10m and parl1m have respectively? 6. Are there only changes to better translations with the increase of the size of corpora? a. Do the changes happen only in improved translations? b. Are there any changes in the translated sentences which are evaluated as impossible? 7. What grammatical structures in the source sentences are difficult for SMT? 11

16 4 Data 4. 1 Corpus The corpora used in this work consists of the corpus of OpenSubtitles2016.es-fi 4, the corpus of books 5, and the corpus of Euproarl 6, chosen from an online free source called OPUS, with a language pair from English to Finnish. All the texts are sentence-aligned. Subtitle Books Europarl English On the way, I bought him some neckties for a present. He liked blue. He never thought of buying things like that for himself. And then I stopped at the minister's to remind him that Joe and I would be there at 4:00, and not to forget. We'd been away from each other so long. More than a year. Holmes sat in silence in the cab as we drove back to Baker Street, and I knew from his drawn brows and keen face that his mind, like my own, was busy in endeavouring to frame some scheme into which all these strange and apparently disconnected episodes could be fitted. The European Union or individual States must not take over from economic operators, but public authorities must define the rules and objectives which enable the economy to develop in a sustainable fashion. Finnish Ostin matkalla hänelle solmioita lahjaksi. Hän piti sinisestä. Hän ei olisi ikinä ostanut mitään sellaista itse itselleen. Poikkesin papin luona muistuttamassa että Joe ja minä tulisimme neljältä, ettei hän vain unohtaisi. Olimme olleet niin pitkään erossa. Yli vuoden. Holmes istui äänettömänä minun rinnallani ajaessamme takasin Baker Streetille, ja hänen rypistetyt kulmakarvansa ja kasvojen terävä ilme ilmottivat minulle, että hän, niinkuin minäkin, koetti sommitella niitä puitteita, joiden sisään kaikki nämä näennäisesti hajanaiset seikat sopisivat. Euroopan unionia, valtioita, ei saa korvata taloudellisilla toimijoilla, vaan julkisen vallan on määriteltävä ne säännöt ja tavoitteet, jotka mahdollistavat talouden kestävän kehityksen. table 1----corpora texts The above table gives examples of three genres. The contents of subtitles mainly consist of conversations, where slang and colloquial expressions are largely used, which means that its register tends to be the least formal among the three corpora, while Europarl is the most

17 formal one, as it collects the speeches from the meetings of European union. As for the corpus of book, it is less colloquial than corpus of subtitles, and less formal than corpus of Europarl, as it contains both descriptive contents as well as conversational contexts OPUS the open parallel corpus OPUS is an open source which provides a growing collection of translated texts as well as tools for processing parallel and monolingual data. Texts are aligned from free online data, which come to be available parallel corpora for the public. The focus in OPUS is to provide freely available data sets in various formats together with basic annotation to be useful for application in computational linguistics, translation studies and cross-linguistic corpus studies (Jörg, 2012). OPUS tries to cover a large scale of data, especially for those under-resourced languages and domains. The corpus covers over 90 languages and includes data from several domains such as a collection of translated literature, news commentary, GNOME localization files and so on, among which the largest domains are legislative and administrative texts, translated movie subtitles and localization data from open-source software projects (Ibid). The formats that OPUS provides are XCES/XML 7, Moses 8, and TMX OpenSubtitles 2016 The corpus of OpenSubtitles is a collection of documents from a website 10 which provides an extensive multilingual collection of movie subtitles translated by users and commercial use of it is prohibited. The first version of OpenSubtitles was created in 2006 (Jörg, 2009), with 361 bitexts in 30 languages. The total number of files is 20,400, and the total number 7 XCES/XML: Data in their native XML format. 8 Moses: Data in plain text format, encoded in UTF-8. 9 TMX: Data in Translation Memory exchange format

18 of tokens is M. Since the extension of OpenSubtitles2011, not only the size, but also the content has been expanded. Subtitles cover various genres and combine features from spoken language corpora and narrative texts including many dialogs, idiomatic expressions, dialectal expressions and slang (Jörg, 2012). OpenSubtitles2016 (Pierre&Jörg, 2016) includes all the previous distributions from 2009 to It covers 1, 689 bitexts in 60 languages, which is a lot expanded compared with the first version. It has 2,815,754 files in total with 2.6 billion sentences (17.2 billion tokens). The corpus chosen for this work is in Moses format, containing 19,187,945 sentence pairs and million words Corpus of Books The corpus of books is a collection of literature works, such as Sense and Sensibility by Jane Austen, or La Dame aux Camélias by Alexandre Dumas 11. Compared with those of subtitles, it is a very small corpus with only 64 bitexts in 16 languages, containing 158 files with 19.50M tokens. Most books contain an English text, however, the one used for this this work contains only two works: The Hound of the Baskervilles by Sir Arthur Conan Doyle and Les Dieux ont soif by Anatole France. It contains 3,645 sentence pairs and 0.10 million words in Moses format Corpus of Europarl The corpus of Europarl (Koehn, 2005) is a multilingual parallel corpus for statistical machine translation, which is extracted from the proceedings of the European Parliament, covering 21 European languages such as French, German, Czech, Finnish and so on. It has

19 211 bitexts, containing 187,072 files with M tokens. The one used for this work (English-Finnish) contains 1,968,346 sentence pairs and million words. 4.2 Source text The source text is an excerpt from the subtitles of an American movie called where the red fern grows. The reason of choosing this text can be explained as this: Firstly, it needs to go with subtitles, as the corpus which has been used is from the subtitle domain; secondly, it has not been translated into Finnish yet; thirdly, it is adapted from a novel, as in one experiment, the corpus is combined with literature, one of the out-of-domain corpora. The source text contains 229 sentences and 1316 words. 15

20 5 Methodology The flow chart below shows all the steps of this work, which consists of two parts mainly. The first part is to translate one source text via Moses; the second part is to evaluate and analyse the translations produced by Moses. Download open subtitles2016 Extracting 1million/2 million/10 million sentence pairs respectively Prepare corpora Download Books Combine Books with 1m corpus Download Europarl Combine Europarl with 1m corpus Translate with moses Corpora preparation Language model Tokenization Cleaning Truecasing Training translation model Tuning Testing Evaluate translations Score evaluations Analyse sentences 16

21 5.1 Translate with Moses The first major process is to translate one source text by using the statistical machine translation engine Moses. For this, a phrase-based translation model is built, as it is temporarily the most popular model, as well as the easiest one to train compared to other models such as Factored Model (Koehn et al., 2007). It is trained with five corpora: three domain-specified corpus (the corpus of subtitles) and two out-of-domain corpora (the corpus of book and the corpus of Europarl). The translation process consists of 5 stages: corpus preparation, language model training, training the translation system, tuning and testing Corpus Preparation The first step is Tokenisation. According to the definition of tokens in Moses manual (Koehn, 2011: 358), tokens are a sequence of characters, such as punctuation or symbols, separated by a space. They are the basic unit in a machine translation process. Tokenisation aims to insert spaces between words and punctuation. Truecasing is the process where the initial words in each sentence are converted to their most probable casing in order to reduce data sparsity. In other words, it makes sure that the machine translation output has the right case in words that have capital letters. Finally at the last step of corpus preparation, cleaning is about to remove long, empty and mis-aligned sentences which can cause problems with the training pipeline Language Model Training The language model is used to ensure the fluency of output of a statistical machine translation system by using LM tools such as IRSTLM (Federico et al., 2008), RANDLM (Talbot and Osborne, 2007), KenLM (Heafield, 2011) and so on. It is essential to any 17

22 statistical machine translation system. It is built with the target language file, or by adding more data to the target language file you already have. In this work, only target language file is used for LM training. Language model calculates the probability of a word after a given sentence or the probability of a given sentence and helps the translation system to find the right word translation or word order. In this work, IRSTLM is tried first, but it doesn t work due to some unclear technical problems. Then KenLM is applied as it handles a large size of corpus better and faster compared to other language models, and especially, in the case of using Moses, it is already included in Moses by default, which means it doesn t need to be installed separately while IRSTLM does. Basically, what language model does is to assign a higher probability to a better translation when there are several options for the translation system. For example: PLM (the man is tall) > PLM (tall the is man) or PLM (He is going home) > PLM (He is going house) Therefore, a probabilistic language model PLM prefers a correct word order as well as a better word choice and it assigns the former sentence a higher probability (Koehn, 2010: 181) Training the translation system This is the main part of the whole process, where you actually produce the translation model. The process involves running word-alignment producing phrase-table, and creating Moses configuration file. The word-alignment tool used in this work is MGIZA (Gao and Vogel, 2008). On the premise of sentence-aligned data, word-aligned data can significantly improve the translation quality, incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data (Callison- 18

23 Burch, 2004). Phrase-table is the core of translation system, which contains phrase-tophrase translations extracted by MGIZA from the training corpus with probabilities computed. The Moses configuration file, i.e Moses.ini 12 is for the decoder Tuning Tuning is a process aiming to improve the quality of MT output, the translation system try to find the best weights between the component of a training (Phrase Table, Language Model, and the other models) to achieve the best quality by translating the tuning corpus repeatedly (Machada and Hilario, 2014). It requires a small amount of parallel data, a 500 to 2000 segment corpus is generally considered enough (Ibid) and it is usually processed at the end of the training of a corpus. It is the most time-consuming part of the whole training process Testing Testing is the process to measure how good the translation system is and to get the BLEU score, which is also another important parameter of this work. Even though the translation model is already ready to work, it takes too much time to load the phrase-table. Thus, we can make the decoder start quickly by binarisng the phrase-table and lexicalised reordering models.in order to measure how good the translation system is, another parallel data set is needed as the test set. The data need to be tokenized and truecased exactly the same as the first part of corpus preparation. Then the decoder will translate the test set first and run the BLEU script. 12 Moses.ini file can be found in Appendix II. 19

24 In summary, one English text is translated into Finnish in 5 different versions based on different settings of the training corpus. Training Corpus Domain Sentence pair word setting 1 Subtitle 1 million 7mil en + 4mil fi setting 2 Subtitle 2 million 14mil en + 8mil fi setting 3 Subtitle 10 million 69mil en + 44mil fi setting 4 Subtitle+Book 1mil+2.6k 7mil en + 4mil fi setting 5 Subtitle+Parliament 1mil+1mil 32mil en + 21mil fi table 2----five settings of corpora In the first setting, the corpus contains 1 million sentence pairs from OpenSubtitles2016. The reason for choosing sentence as the baseline unit is because corpus for SMT is usually sentence-aligned (koehn, 2009), as well as the availability of enough data for the experiment. In the second setting, another 1 million sentence pairs from OpenSubtitles2016 is added to the previous 1 million corpus, which became a corpus with 2 million sentences. In the third experiment, the corpus is added to 10 million sentence pairs containing the corpora which have been used in the previous two experiments. All the three trainings are tuned with the same test sets in order to keep the variables as few as possible. In the fourth setting, an out-of-domain corpus Books is added to corpus with 1 million sentence pairs, thus the new corpus contains sentence pairs. In the fifth setting, another outof-domain corpus Europarl is also added to corpus with 1 million sentence pairs. Thus: 20

25 1m: Translation generated by the model trained with a corpus containing 1 million sentence pairs from OpenSubtitle m: Translation generated by the model trained with a corpus containing 2 million sentence pairs from OpenSubtitle2016, covering the first 1 million sentence pairs. 10m: Translation generated by the model trained with a corpus containing 10 million sentence pairs from OpenSubtitle2016, covering the first 2 million sentence pairs. bk1m: Translation generated by the model trained with a joint corpus with 1 million sentence pairs from OpenSubtitle2016 and 3645 sentence pairs from Books. Parl1m: Translation generated by the model trained with a joint corpus with 1 million sentence pairs from OpenSubtitle2016 and 1 million sentence pairs from Europarl. The translated texts are compiled into a table (table 3): ST 1m 2m 10m bk1m parl1m I was a little boy growing up deep in the Ozark mountains with my mom and dad and my two little sisters, minä olin pikkupoika kasvaa syvään sisään ja vuoret Ozark äitini ja isäni ja kaksi pientä sisaret. minä olin pikkupoika kasvaa, - syvällä pelätyn Ozark vuoret - ja kaksi pientä. olin pieni poika kasvaa syvällä vuorilla Ozark äidin ja isän kanssa ja kaksi pikku sisaret. minä olin pikkupoika - syvälle kasvaa Ozark vuoret äitini kanssa ja isäni ja kaksi pientä. minä olin pikkupoika, joka kasvaa syvälle vuorien Ozark vuoria ja äitini ja isäni ja kaksi pientä sisaret. and woods in all directions as far as the eye could see. ja metsässä suuntaan niin pitkälle kuin silmä näkyi. ja metsässä suuntaan niin pitkälle kuin silmä näkee. ja suuntaan metsää silmänkantam attomiin. ja metsässä, suuntaan niin pitkälle kuin silmä näkyi. ja kaikkiin suuntiin, sikäli kuin se voisi nähdä. My ma was part Cherokee, - Minun oli osa Cherokee. - Minun oli Cherokee. mun äiti oli osa Cherokee. hyvä äiti oli osa Cherokee. - Veljeni kuului Cherokee. table 3----Five translations 21

26 5.2 Evaluation and Scoring In general, the data from evaluation has been analysed from two perspectives: Quantitatively, chi-square 13 analysis, a statistical test commonly used to compare observed data with the data we would expect to obtain according to a specific hypothesis, is conducted with different non-parametric variables. In another word, it is used to investigate whether the translations differ from one another by comparing the evaluation results. Qualitatively, the source text and all the translations are analysed sentence by sentence in details, in order to answer the qualitative research questions, for example, what typical structure the source sentence entails are harder for SMT. Three students evaluated all the translations sentence by sentence. All of them are Finnish native speakers. Evaluator one is majored in English Language and Translation Studies in the 5th year Master s Degree, with a minor in Finnish Language and Translation Studies. Evaluator two is also a Master student majored in English Language and Translation Studies. Evaluator three is a Bachelor student from General Linguistics. All of them are experienced with English-Finnish translation work Evaluation Three students evaluated each sentence in each translation as impossible, possible, and best by giving the score of 1, 2 and 3 respectively. For Example: ST 1m 2m 10m bk1m parl1m What can I do for you? mitä voin tehdä vuoksesi? 1 kuinka voin auttaa? 3 kuinka voin auttaa? 3 mitä voin tehdä? 1 miten voin auttaa? 2 13 Chisquare: p <.05 significant; p <.01 highly significant; p <.001 very highly significant. 22

27 we're going to have to set up camp before dark. meidän täytyy Perustakaa leiri ennen pimeää. 1 meidän täytyy Perustakaa leiri ennen pimeää. 1 - Pystytetään leiri ennen pimeää. 2 meidän täytyy leiriytyä ennen pimeää. 3 meidän on perustettava leiriin ennen pimeää. 3 table evaluation of translations (1: impossible; 2: possible; 3: best) In the first sentence, 1m and 10m are evaluated as impossible by giving 1, parl1m is evaluated as possible by giving 2, while bk1m and 2m are evaluated as best by giving 3. After evaluation, the total amount of impossible, possible and best translations by each evaluator are counted into the following table. By being possible, impossible and best, there isn't any specific rule to regulate them, but letting the evaluators to judge if the translations are acceptable or readable according to their own knowledge. Evaluator1 Evaluator 2 Evaluator 3 imp po best imp po best imp po best 1m m m bk1m parl1m table 5----Evaluation results The above results of the evaluations are analysed with chi-square analysis as explained in the chapter of Result Analysis. 14 Examples from Evaluator 1. 23

28 5.2.2 Scoring After the chi-square analyses of the evaluation data, some research questions in the quantitative perspective can be answered. Thus in order to get into the qualitative research, which is to investigate the grammatical structures within the source text and translated sentences, it is necessary to categorise the sentences for the purpose of knowing where the changes actually happened in five translations. Do the same sentences get impossible or do they get impossible in evaluation 1, but possible in evaluation 2 and 3? Therefore, in order to solve this problem, a criterion is needed to group the sentences. The translations evaluated as possible or best are marked with a score of 1, while those evaluated as impossible are scored as 0. If a sentence is attributed with 3 points, it means that all the three evaluators mark it as possible, while if it gets 1, it means that only one evaluator marked it as possible and two evaluators marked it as impossible. Thus, the possible scores could be: 0 = all impossible 1 = 1 possible or best 2 impossible 2 = 2 possible or best and 1 impossible; or 1 possible, 1 best and 1 impossible 3= all possible table 6----scoring Under this condition, every translation is scored respectively. For example: ST Evaluator 1 Evaluator 2 Evaluator 3 score-1m When they found them in the spring, kun he löysivät ne keväällä. 2 kun he löysivät ne keväällä.2 kun he löysivät ne keväällä. 2 3 a beautiful red fern had grown up between them. kaunis punaisessa fern oli kasvanut välissä. 1 kaunis punaisessa fern oli kasvanut välissä.1 kaunis punaisessa fern oli kasvanut välissä. 1 0 And that spot was sacred forever. ja tämä paikka oli pyhä ikuisesti. 3 ja tämä paikka oli pyhä ikuisesti.2 ja tämä paikka oli pyhä ikuisesti.1 2 table 7----socre of 1m translation 24

29 The above table is from the scoring of 1m translation. In the first sentence, all the three evaluators judge the translation as possible by giving 2. Thus the score of the first sentence is 3, while in the second sentence, they all give 1 to the translation, so the score of the second sentence is 0. After scoring, the sentences are grouped based on their scores in order to know which sentences become better, which get worse and which stay unchanged. source text score-1m score-2m score-10mstt I was a little boy growing up deep in the Ozark mountains with my mom and dad and my two little sisters, and woods in all directions as far as the eye could see My ma was part Cherokee, When they found them in the spring, table 8----score of source sentences In table 8, for sentence 1, all the three translations score 0, meaning that the translation quality stays impossible in general, though it does not necessarily mean there are no minor changes in the translations. For sentence 2, 1m and 2m score 0, while 10m scores 1, indicating that one evaluator thinks the translation is possible, therefore 10m translation quality is only slightly better than the previous ones. For sentence 3, 1m and 2m score 0, while 10m scores 2, showing that two evaluators mark the translation as possible or maybe best, so the 10m translation quality is improved significantly. For sentence 4, 1m and 2m score 3, indicating both translations are of high quality, whereas the quality goes down in 10m, as it only scores 1. Thus by having this table, qualitative analysis can be proceeded by categorising the sentences into four groups: 25

30 Cat 1: Impossible to possible Cat 2: Possible to Impossible Cat 3: Unchanged impossible Cat 4: Unchanged Possible A small research Considering the generality of evaluating the sentence only in three levels impossible, possible and best, a small study is carried out as a complementary to the previous study as well as to support the result of it. All the translated sentence are graded from 1 5 (table 9), which avoids the situation in the previous evaluations that even though the translation gets a bit better, it is still not good enough to be marked as possible. Therefore, it is expected that in this way, the translation of out-of-domain corpora will be slightly better than the others. st 1m 2m 10m bk1m parl1m I was a little boy growing up deep in the Ozark mountains with my mom and dad and my two little sisters, minä olin pikkupoika kasvaa syvään sisään ja vuoret Ozark äitini ja isäni ja kaksi pientä sisaret. 2 minä olin pikkupoika kasvaa, - syvällä pelätyn Ozark vuoret - ja kaksi pientä. 1 olin pieni poika kasvaa syvällä vuorilla Ozark äidin ja isän kanssa ja kaksi pikku sisaret. 3 minä olin pikkupoika - syvälle kasvaa Ozark vuoret äitini kanssa ja isäni ja kaksi pientä. 2 minä olin pikkupoika, joka kasvaa syvälle vuorien Ozark vuoria ja äitini ja isäni ja kaksi pientä sisaret. 4 My ma was part Cherokee, - Minun oli osa Cherokee. 2 - Minun oli Cherokee. 1 mun äiti oli osa Cherokee. 4 hyvä äiti oli osa Cherokee. 3 - Veljeni kuului Cherokee. 1 and there was a legend in those parts that a little Indian boy and girl got lost in a blizzard and died. ja siellä oli legenda noissa sen vähän tavaroita, poika ja tyttö on kadonnut - ja kuoli. 1 ja siinä oli legenda siihen, että siellä on pieni poika ja tyttö on eksynyt ja kuoli. 3 ja siellä oli legenda, intialainen poika ja tyttö on menettänyt lumimyrskyssä ja kuoli. 4 ja siellä oli legenda niihin, että vähän intialainen poika ja tyttö on eksyväni ja kuoli. 2 ja siellä oli legenda niillä alueilla, jotka hieman Intian poika ja tyttö katosi ja kuoli. 1 26

31 st 1m 2m 10m bk1m parl1m What can I do for you? mitä voin tehdä vuoksesi? 4 kuinka voin auttaa? 5 mitä voin tehdä? 2 kuinka voin auttaa? 5 miten voin auttaa? 4 table 9----score of sentences in 5 scales With the help of one Finnish native speaker, every sentence is scored one by one. The sentences are ranked from 1 to 5, from the worst translation to the best translation. For example, in the first sentence, translation of 2m is the worst one, which scores 1. Translations of 1m and bk1m score 2, which are slightly better than that of 2m, but worse than that of 10m which scores 3. The translation of parl1m scores 4, the highest among the five, which is the best translation among them. There is no perfect translation of this sentence, as there is no sentences scored 5, which is the same situation as in sentence 2. However, in sentence 3, the translations of 2m and bk1m both scored 5, which means they are the perfect translations. 27

32 6 Results Analyses This chapter consists of three parts. First, BLEU scores are obtained. Second, quantitative analyses are conducted in order to support or contradict the results of BLEU. Third, based on the results of quantitative analyses, translations are studied from a linguitic perspective in order to find out the detailed reasons of why Finnish language remains difficult for statisitcal machine translation system. 6.1 BLEU As introduced in Chapter 2, Moses, is a toolkit containing components to preprocess data, including training the language models and other tools, the evaluation metric BLEU is also part of it. Thus after the translations are produced, BLEU scores are also obtained (Appendix I) to evaluate the results (table 10). 1m 2m 10m bk1m parl1m table BLEU scores of five translations According to table 10, The highest score (16.46) goes to 10m, meaning 10m is the best translation among the five and bk1m is of the least quality with a score of Parl1m ranks at the second place, which shows that out-of-domain corpus does help to improve the translation quality slightly as its score (12.28) is not even 1 point higer than of 1m translation (11.55). However, instead of 1m being the one of the least qualitity as assumed in hypothesis, it is slightly better than bk1m. The score of 2m is reasonable. It is between the scores of 2m and Parlm, which is consistent to the hypothesis. 28

33 6.2 Quantitative First, a series of analyses are conducted by counting all the impossibles, possibles, and bests 15 of the three evaluators together, namely, there are three variables: impossible, possible and best. 3 x 5 imp pos best 1m m m bk1m par1m p= table P-value of five translations with three variables In 1m translation, there are 423 impossible sentences, 168 possible sentences and 93 best sentences according to the three evaluators. The total amount of evaluated sentences is 684 by three evaluators, that is 288 sentences for each. A P-value 0.06 is obtained, which is slightly higher than the threshold value (0.05), indicating that the result is not significant, which means that there is no significant association among the variables. In another word, among the five translations, not all of them make a difference to another, and the quality of some translations is more or less the same as other translations, though they have different settings of corpus. Thus, in order to get a significant result, all the bests are merged into possibles. 15 possibles=possible sentences; impossibles=impossible sentences; bests=best sentences. 29

34 imp po 1m m m bk1m Par1m p= table P-value of five translations with 2 variables In this case, there are only two variables, impossible and possible. A P-value of is obtained (table 12), which is highly significant, indicating that there is a highly significant difference among the five sets of data, that is, some translations are statistically better than the other ones due to the different settings of corpora. From a more visual perspective, the bar chart (figure 1) illustrates the total amount of impossibles and possibles of five translations evaluated by three subjects. Figure 1----total amount of impossibles and possibles of five translations 30

35 Overall, the impossibles of five translations exceed the possibles. 1m translation and parl1m translation have the most significant difference between impossibles and possibles, while the least significant difference between impossibles and possibles lies in the 10m translation. We can see that the highest point of impossible translations (423) goes to 1m, and the impossibles decrease as the quantity of the corpus increases and reaches the lowest point (365) at 10m. Relatively, here in this chart, we can see that the lowest point of possible translations (261) goes to 1m, and the possibles increase to (319) proportional to the increase of corpus quantity. However, data of two translations made by out-of-domain corpus, bk1m and parl1m are very similar to 1m translation, which is a quite surprising result of this work. It is not clear whether this is due to the rather similar quantity between bk1m and 2m or not. Based on this, a series of more detailed analyses are conducted by using 2*2 in Chi-Square analysis. 1m translation is compared with all the other four translations one by one, as the goal is to find out which translations have the most improvements, so that it will differ the most with 1m translation. imp po 1m m p= imp po 1m m p= imp po 31

36 2m m p= imp po 1m bk1m p= imp po 1m parl1m p= table *2 analysis From the 2*2 analysis table, we can see that the P-value (0.001) of 1m and 10m is the only significant one. Moreover, it is highly significant (P<0.01) and close to very highly significant of which the significance boundary is 0.001, indicating that the translation quality of 10 translation is highly different than that of 1m translation, which tells us that the different setting of corpus helps to make this difference. While, the other translations do not have a big difference with 1m trnaslation as their P-value is not significant, all bigger than Especially in the last set, the P-value of 1m and parl1m is close to 1 (0.95), indicating its translation quality is more or less the same as 1m translation, which means that out-of-domain corpus does not help improve the translation quality in this case. On the other hand, analysis is also conducted from the evaluators perspective (table13 and figure 2). Table13 shows the total amount of impossibles in each translation by each translator. For example, evaluator 1 (E1) marks 152 sentences as impossible in 1m translation, 139 in 2m translation, and evaluator 2 (E2) marks 106 sentences as impossible 32

37 in 10m translation etc.. The P-value shows an extremely high similarity among the three evaluators, meaning that the evaluations are highly consistent. E1 E2 E3 1m m m bk1m parl1m p= table the total amount of impossibles in each translation by three translators figure the total amount of impossibles in each translation by three translators According to figure 2, the lowest point falls on 10m, which means it has the least impossible translations among the three evaluations, in another word, it has the best translation quality. Unsurprisingly, 1m translation reaches the highest point, which justifies one of our hypotheses that 1m translation is of the least quality among the five translations. Starting 33

38 from 1m, all the three lines decrease gradually, showing that adding the quantity of corpus does improve the quality of translations. However, bk1m is only sightly lower than 1m and parl1m almost reaches the same height as 1m. Two evaluators (blue and grey) are highly similar to each other, as two lines almost overlap each other, while the third one has around 20 points low score than the others, however, the trend of the line is consistent, which could mean the criteria of evaluating the translation quality differ a bit, but they are consistent in general. This is also consistent with the background of three evaluators, since the two who share high similarities are both Master s students from Translation Studies, while the third one is a graduated Bachelor s student from General Linguistics. Before continuing to the qualitative research, it is necessary to categorise the sentences in order to know where the changes actually happened in five translations. We do not know if the same sentences get impossible or if they get impossible in Evaluation 1, but possible in evaluation 2 and 3. Therefore, in order to solve this problem, all the translated sentences are scored as described in Chapter 5. It is shown in the 2*2 analysis (table 13) that only the P-value of 1m and 10m is significant, so these two translations are scored respectively. All the scores are counted, and compiled into the following table: (table 15) 10m score m

39 table scores of 1m and 10m In 1m translation, 122 sentences score 0, meaning that there are no translations marked as possible in 122 sentences, as 93 in 10m translation. 18 translations score 1, meaning that among 18 sentences, one of the three translations is marked as possible. 82 sentences remain the same score as 0, indicating 10m does not change or has minor changes, but not good enough to make the translation a decent one. 77 sentences remain the same score as 3, referring that 1m translation is good enough and 10m translation remains unchanged, and does not degrade the translation. However, there are some sentences degraded by increasing the quantity of corpus. By doing this, the criterion of what is possible and what is impossible is achieved. In order to simplify the scoring, a McNemar chi-square (Hatch and Lazaraton, 1991), which is a paired version of Chi-square test, is made by categorising 0 and 1 as impossible, 2 and 3 as possible (table 16). 10m imp po 1m imp po z-value table Z-value of 1m and 10m 35

40 30 sentences become possible from impossible. 10 become impossible from possible. 110 stay impossible and 75 stay possible. Z-value is 3.16, which is equivalent with P-value (Hatch and Lazaraton, 1991). Thus, this result shows very high significance, meaning that there are some sentences worth indepth anlysis in 10m translation. To sum up, according to the quantitative analysis, a few conclusions can be drawn. First, 10m is the best translation among the five settings, indication that adding in-domain corpus amount improves the translation quality, so 2m is better than 1m and 10m is better than 2m. This is consistent with the BLEU results. Second, the evaluation results show that the quality of bk1m is slightly better than 1m, while parl1m, which has bigger data, is more or less of the same quality as 1m, which means adding out-of-domain corpus does not help much to improve the translation quality in this work. Third, there is no big difference among the three evaluators, with two of them being highly similar to each other. 6.3 Qualitative This section aims to find out what grammatical structures make it more difficult for Finnish in Moses, as well as to find out the differences of improvements between in-domain translations and out-of-domian translations. Thus, with the above results, we started to extract the sentences out by comparing their translations. The translations of in-domain corpora are analysed first Qualitative analysis of in-domain translation Their sentences are divided into four categories: Cat 1: Impossible to possible Cat 3: Unchanged impossible Cat 2: Possible to impossible Cat 4: Unchanged possible 36

41 Cat 1 Impossible to possible From impossible to possible translations, we categorise them into two types: 1. Impossible in 1m, possible in 2m and 10m (11 sentences) a. 2m and 10m choose better words st 1m 2m 10m we're going to have to set up camp before dark. meidän täytyy Perustakaa leiri ennen pimeää. - Pystytetään leiri ennen pimeää. meidän täytyy leiriytyä ennen pimeää. How long you've been saving up, boy? Kauanko olet ollut pelastusta, poika? Kauanko olet säästänyt? Kauanko olet säästänyt? table Examples of better word choosing In the first sentence, though perustaa leiri is a decent translation, 1m uses the imperative form of the verb (Perustakaa), and 2m uses passive voice (pystetään). Leiriytyä is better chosen and it is in the infinitive form after täytyy. In the second sentence, 1m makes the typical mistake that a machine translation can make, word sense ambiguity, the task of determining the right word sense for a word in a given context (Koehn, 2010). Pelastus or pelasta means save in the sense of to stop someone or something from being killed, injured, or destroyed 16. However here in this context, save, especially with the preposition up, refers to to keep something, especially money, for use in the future, which correspondingly should be säästää in Finnish. b. 2m and 10m lost information, which leads to better translations st 1m 2m 10m Need some help with that feed? Tarvitsetko apua sen kanssa ruokaa? Tarvitsetko apua? Tarvitsetko apua?

42 st 1m 2m 10m It's his hound. se on hänen mikä lintukoira tuollainen. se on hänen. se on hänen. table examples of information loss In the first sentence, 1m translates the sentence word by word and loses the correct syntax of Finnish as in most of the cases. It tends to cover all the information of the source text, while in 2m and 10m, the preposition phrase with that feed is ignored, and the main part makes the translation more sensible as well as more idiomatic syntactially. In the second sentence, 2m and 10m fail to translate the word hound, which makes them a slightly better translation according to the context. c. More difficult morphological problem: translative case. st 1m 2m 10m That ain't good for your character. se ei ole hyvä. se ei ole hyväksi. se ei ole hyväksi. It'll make you feel better. se piristää. se saa olosi paremmaksi. se saa olosi paremmaksi. table examples of translative case In Finnish, the ending -ksi in hyväksi is called translative ending, which generally expresses a state, property, function or position into which something or someone enters, or the end point of a movement or change (Karlsson, 2009: 187). 2. impossible in 1m and 2m, possible in 10m (19 sentences). a. 10m pays more attention to syntax 10m pays more attention to syntax, as it tends to keep the translation grammatically right rather than covering all the information, sometimes unnecessary information. It seems like 38

43 when 10m is facing the choice between keeping as enough information as possible and being as faithful to the meaning as possible, it inclines to the latter. For example: st 1m 2m 10m My ma was part Cherokee, - Minun oli osa Cherokee. - Minun oli Cherokee. mun äiti oli osa Cherokee. Yeah, y'all take care, now. - Kamu, nyt. - teistä huolta. - Pärjäilkää. I sure seen better dogs in my day. olen varma, että nähnyt parempaa koirat minun päivä. en nähnyt parempiakin koiria. olen nähnyt parempia koiria. Well, he ain't gonna forget. hän ei ole unohda. hän ei ole unohda. hän ei unohda. Appreciate it. Appreciate sen. vastaan siihen. Kiitos. table examples of 10m paying more attention to syntax In general, the translations of 10m are pragmatically correct, although they loose the syntax and accurate wording. In the first sentence, ma is a colloquial form of mum. Among all the translations we have, 10m is the only one that recognises this word. In addition, mun is a typical expression of minun in southern dialect in Finland. As the source text is not grammatically correct ( My ma was part of Cherokee should be the right one), all the translations fail the original meaning. In the second sentence, the word kamu in 1m translation means pal, probably referring to y all in the source sentence. However, it does not get the main meaning of the sentence, while 10m translation does, though it does not contain any other elements appearing in the source sentence. In the third sentence, the speaker omits the auxiliary verb have, and sure right after subject I misleads the 1m translation system as I am sure. 10m translation, however, is not tricked. 39

44 It focuses on the main structure by neglecting the adverb sure which is a less important element of the sentence. In the fourth sentence, once again, 10m translation system focuses on the main idea of the source sentence, without being influenced by the colloquial expression. b. missing info leads better translations as some of the cases in 2m too. st 1m 2m 10m I'm just trying to sell it for him. Oh, I had it on me. yritän vain myydä sen hänelle. voi, minulla oli se minulle. yritän myydä sen hänelle. yritän vain myydä sen. minulla oli. minulla oli se. That'll calm you down. se tulee rauhoittaa sinut. se on rauhallinen. se rauhoittaa. I sure seen better dogs in my day. olen varma, että nähnyt parempaa koirat minun päivä. en nähnyt parempiakin koiria. olen nähnyt parempia koiria. He's the one that helped me. hän on ainoa, joka auttoi minua. hän on ainoa, joka auttoi minua. hän auttoi minua. table examples of missing information leads better translations As mentioned before, with the increase of the corpus size, more attention is paid to the syntax of the translated sentences rather than translating them word to word. The models simply fails to translate them, so some unnecessary information is lost, for example for him is untranslated in 10m in the first sentence; in my day is untranslated in both 2m and 10m in the fourth sentence. c. 10m covers more better words 10m covers better words, which is the main point of increasing the size of corpus. The bigger amount a corpus has, the less out-of-vocabulary (OOV) problems it has. 40

45 ST 1m 2m 10m You done your schoolwork? Oletko tehnyt sinun koulukamoja. - Olet tehnyt koulukamoja. Oletko tehnyt läksysi? Appreciate it. Appreciate sen. vastaan siihen. Kiitos. table examples of 10m covering more better words In the first sentence, the colloquial koulukamoja (partitive form of koulukama) refers to schoolstuff literally, while läksy is the standard word for homework. In addition, the ending -si in läksysi is called possessive suffix. Along with the genitive form (here: sinun) of the personal pronoun (sinä), they function as possessive pronouns, marking possession for the different grammatical persons. Genitive pronouns in the first and second persons can be omitted when they occur together with a possessive suffix (Karlsson, 1999), as in this sentence. There, 10m offers better translation in the sense of morphology. In the second sentence, the English word remains in 1m, which is the least thing we expect from a machine translation. Apparently, the corpus of 1m translation does not cover the word appreciate. Cat 2 possible to impossible From possible to impossible, there are four kinds of situations: possible in 1m and 2m, impossible in 10m ( 6 sentences); possible in 1m, impossible in 2m and 10m (3 sentences); possible in 2m, impossible in 1m and 10m (9 sentences); possible in 1m and 10m, impossible in 2m (7 sentences). 41

46 ST 1m 2m 10m When they found them in the spring, kun he löysivät ne keväällä. kun he löysivät ne keväällä. kun he löysivät heidät. And, sure as I'm living, ja totta kuin elän. ja totta kuin elän. ja niin olen. It ain't no $35 dog. And that spot was sacred forever. se ei ole 35 dollarin koira. ja tämä paikka oli pyhä ikuisesti. se ei ole 35 dollaria. se ei ole 35 dollaria. se on pyhä paikka. ja oli pyhä ikuisesti. table examples of possible to impossible In general, the main reason for this degradation, no matter from 1m to 2m, or 2m to 10m, is caused by information loss. This is the main discovery we found during this work. 10M translation is better for sure as proved above, it loses information, for which we are not clear about the reason yet. This information loss leads to poor translation quality as well as acceptable translation quality. Cat 3 possible to possible In this category, there are 70 sentences. Most of them are short sentences, phrases or words that are commonly used (table 24), which is to say, first, a small corpus is enough for Moses to translate those sentences; second, Moses is still not able to deal with difficult structures of English-Finnish translation, as shown in Category 2 that though 1m translation succeeds to translate the sentences, the system may still fail to translate them with the increasing amount of corpus. ST 1m 2m 10m Is that right? Niinkö? Niinkö? Niinkö? What are you talking about? mitä tarkoitat? mitä tarkoitat? mitä sinä puhut? 42

47 ST 1m 2m 10m I want you to understand something, Billy. haluan, että ymmärrät jotain, Billy. haluan sinun ymmärtävän jotain, Billy. haluan sinun ymmärtävän jotain, Billy. table examples of possible to possible Cat 4 impossible to impossible For Category 4 the unchanged impossible, there are 101 sentences in total. Several typical structures are especially difficult to the translation system. Sentences with preposition phrases might be the biggest problem, for there is no preposition in Finnish at all, while case ending is used instead. Rare words is a common problem for SMT. When there is no examples from the corpora, the translation system can not recognise the words at all. There are more structures which need to be discussed, such as participle structure, or sentences that are literally right, but not consistent with the context, idioms translations; wrong segmentation; verb to verb structure, and so on. Here we choose the most typical structures to analyse. 1. Preposition phrase case ending ST 1m 2m 10m a beautiful red fern had grown up between them. kaunis punaisessa fern oli kasvanut välissä. hieno punainen fern oli kasvanut. kaunis punainen hienosto oli kasvanut. We're up from Tulsa. olemme koulusta Oklahoman Tulsassa. olemme tulossa Tulsassa. olet nyt Tulsasta. He says you got the finest coon-hunting in the Ozarks. hän sanoo, että sinulla on hienoin coon-hunting - Ozarks. hän sanoi, että paras coon-hunting - Ozarks. hän sanoo, että sinun on paras coon-hunting - Ozarks. I was down at Grandpa's store yesterday, olin katselee ukki on 3kauppa eilen. olin isoisä on eilen. 3 olin isoisä on eilen.3 43

48 ST 1m 2m 10m and he said that old man Stanton's collie is about to have pups. ja hän sanoi, että vanha mies, Stanton on collie on pitää pennut. 3 ja hän sanoi, että minulla on Stanton vanhus on pitää pennut. 3 ja hän sanoi, että vanha mies Stanton on Collie on pitää pennut. 3 $35 is a lot to pay for a dog like that dollaria on paljon maksaa koira. 35 dollaria on paljon maksaa koira. 35 dollaria on paljon maksaa koira. table examples of translating preposition phrases The translation of preposition phrases is the most common problem discovered in this research, which is not a surprise to the English-Finnish translation. In the first sentence, apart from the rare word fern, between is another difficult point. Though 1m tries to translate it as välissä, it still misses the object them, while 2m and 10m ignore between completely. The second sentence is a colloquial one with two prepositions coming together. we are up from Tulsa refers to we came from Tulsa. It is not clear why 1m translation has Koulusta in the translation. 10m does better here, but it misreads the personal pronoun as you are up from Tulsa. The third sentence should not have been a difficult one, as the preposition in is equivalent to noun-ssa/ä, which is a common structure in the corpora. The fourth sentence has a preposition phrase, which the system fails to translate. The possible reason is that either there is no such a phrase in the phrase table 17, or the system does not recognise it as a phrase. In the last sentence, for is ignored by the system. However, in some cases, 10m does better than 1m and 2m. For example: st 1m 2m 10m Well, this road'll take you right to it. No, tämä tie vie sinut suoraan sen. tämä tie vie sinut suoraan sen luo. No, tämä tie vie sinut sinne. Well, they're the best in Mr. Bellington's kennel. No, he ovat parhaita herra Bellington on kennel. -He ovat paras herra Bellington on kennel. No, he ovat parhaita herra Bellington kennelissä. 17 Phrase-table: the table that is created automatically when building a translation model. 44

49 st 1m 2m 10m I have something else for you. minulla on jotain muuta. minulla on sinulle. minulla on sinulle jotain. table examples of better translation of prep phrases in 10m In the first sentence, take you right to it, 1m covers all the information in the source sentence. Especially the adverb suoraan is great, which is missing in 10m. However, it fails to understand the structure to it. sen in Finnish, is a genitive form of se, corresponding its in English, which means it requires a head noun (booij G., 2007), while in 10m, though the word suoraan is missing, the sentence is grammatical by translating the prepositional phrase into an adverb sinne, which could be there in English relatively. In the second sentence, 1m has completely failed to translate the inessive case (Karlsson, 1999), which is one of the six local cases in Finnish, referring to location inside something, marked by the ending -ssa or -ssä (ibid). In 10m translation, it manages to cover the inessive case, which is a highlight of the improvement. In the third sentence, 1m neglects the prepositional phrase for you; while 10m covers the main part to make sure of the appropriate grammar though missing the translation of an unnecessary word else once again. Though 2m does cover for you, the sentence does not make sense as a whole. 2. Rare word ST 1m 2m 10m a beautiful red fern had grown up between them. kaunis punaisessa fern oli kasvanut välissä. hieno punainen fern oli kasvanut. kaunis punainen hienosto oli kasvanut. 45

50 ST 1m 2m 10m We got some kinfolk up there. meillä on vähän kinfolk siellä. meillä on joitakin kinfolk. meillä on vähän sukulaisilleen. With hounds like these, you're bound to tree a few. koirien kanssa kuin nämä, olet sidottu puu pari. koirien kanssa, olet sidottu puu. koirien kanssa, olet sidottu puuhun. Where in blue blazes did that come from? missä on sininen blazes sait tuon? missä poliisit hitto se on peräisin? missä smurffia? table examples of translations of rare words Rare word is a common problem for all statistical machine translation. If the corpora do not cover the word which appears in the source text, the translation fails to recognise it. This phenomenon is called out-of-vocabulary (OOV). In the first sentence, fern is remained in 1m and 2m translations, while 10m made an attempt to translate it as hienosto, which is wrong though. In the second sentence, the rare word is kinfolk, which is a relatively less rare word among the six sentences. 1m and 2m translations remain the original word, while 10m chose the right word, but the case ending is wrong. Tree, in the third sentence is not so familiar to us. Apart from being a noun, it can also be a verb, which is defined as verb; Informal. to put into a difficult position. 18 All the three models translate it into puu, which is the nominal form of tree. It is not surprising the system cannot translate it here. Though it is not a rare word, it is used in a rare expression. In English, many phrases make sense when they are used as a whole. In blue blaze is one of those phrases. 1m system translates it word by word, so that blaze is remained because there is no coverage of this word in the corpora. 2m and 10m translations translate it as poliisi hitto and smurffia respectively, which, if I were to guess, is because of the word blue, as both of the translated words have something to do with the colour blue

51 3. Too colloquial subject missing ST 1m 2m 10m Feed and grain, business doing right good. Ruokin ja grain bisnestä, oikein hyvä. Ruokin ja vilja, tekee oikein hyvä. ruoki ja viljaa, menee. Won the coonhunting competition three years running. - voimanostossa coon-hunting kilpailu kolme vuotta. - Ei coon-hunting kilpailu kolme vuotta putkeen. voitti coon-hunting kilpailu kolme vuotta. Earned it. Earned sen. ansaitsi priimuksen arvon Ilmavoimien. Earned. Didn't think hunting dogs was on the schoolwork schedule, Billy. Ettekö usko, että jahtaat koirat oli koulukamoja, Billy. en metsästä koirat oli koulukamoja, Billy. Etkö usko, että hyeenakoiria oli koulutehtävistä, Billy. table examples of too colloquial expressions The source text used for this work is from a movie of which the location is set in Western America, where people use a lot of slangs, and talk in a more colloquial way compared to cities like New York. Therefore, our translation work met many problems concerning colloquial expressions. The above sentences are chosen as the representatives of expressions that lack subjectives, which confuses the system a lot. The first sentence makes sense only when it is put into context. feed and grain basically refers to agriculture. It is said in a very vivid way, as if what they do everyday is to feed the animals and pick the grains. In this case, the best way to express the meaning is either to find a corresponding idiom in Finnish, or to translate the sentence by express the sentence clearer, which, however, can only be done by human translators so far. Therefore, it cannot do anything but translate it word by word. As a result, the first half of the sentence makes no sense. The second half lacks a predicate verb, which makes it more impossible for the system to understand. The other three sentences share the same problem that they all lack subjectives, so that the systems struggle to choose a right form for the verbs. 47

52 4. Word choosing ST 1m 2m 10m and there was a legend in those parts that a little Indian boy and girl got lost in a blizzard and died. ja siellä oli legenda noissa sen vähän tavaroita, poika ja tyttö on kadonnut - ja kuoli. ja siinä oli legenda siihen, että siellä on pieni poika ja tyttö on eksynyt ja kuoli. ja siellä oli legenda, intialainen poika ja tyttö on menettänyt lumimyrskyssä ja kuoli. All right. We'll take it. Selvä. me jatkamme. Selvä. me jatkamme. Selvä. tehdään se. Few more days ain't gonna hurt. Few päivää lisää satuta. Joitakin lisäpäivä ei satuta. pari päivää ei satuta. table examples of word choosing In statistical machine translation systems, word sense disambiguation is the task of determining the right word sense for a word in a given context, and language model is responsible for such a task. Statistical machine translation systems consider local context in the form of language models (Koehn, 2010: 44). However, this work still encounters this problem. In the first sentence, three translations choose three different words for got lost. kadonnut (original form: kadota) and eksynyt (original form: eksyä) both refer to get lost, but the former one is closer to disappeared. menettänyt (original form: menettää) is used when someone lost something. In the second sentence, the conversation is about buying a dog. Take should not be a difficult word, but none of the translations succeeds to translate it in the meaning of buy. Hurt in the third sentence means to cause harm or difficulty 19 according to the context, while satuta is corresponding to another meaning of hurt, to feel pain in a part of your body

53 5. se ST 1m 2m 10m Guess it's better than no dog at all. se on parempi kuin ei koira. se on parempi kuin ei koira. se on parempi kuin ei koira. It's mighty nice meeting you folks. se on todella mukava tavata. se on todella mukava tavata. se on hienoa tavata teidät. It'll be hard, se on vaikeaa. siitä tulee vaikeaa. se on vaikeaa. table examples of se This is a special case in Finnish translation. Finnish sentences do not have to have a subject. Se usually refers to a third person object, such as a dog, or a table. Se on is not exactly the same as it is in English. As mentioned by one of the evaluators, most of the translations would be marked as possible without it, but with it, they are ungrammatical. In all the three sentences above, it does not refer to anything specific, but a formal subject. 6. wrong segmentation ST 1m 2m 10m Why don't you think about that? OK? Mikset? okei? Miettisit? Älä mieti sitä? I did what you said, sanoin, sanoin, sanoin, Your pa know you have this? Onko isä on? - Tietääkö isäsi, sinulla on tämä? teidän isä on? Well, I sure thank you, olen varma, kiitos. No, kiitos. No, kiitos. table examples of word segmentation Word segmentation is also a problem commonly found in this work. The system does not know how to decide which part is more important and which is less. This depends on how the phrase-table in the system is aligned. 49

54 In short, four situations of adding out-of-domain data are analysed in this section: from impossible to possible, possible to impossible, possible to possible and impossible to impossible. In the first category impossible to possible, the main problem solved is that 2m and 10m cover more words than 1m, which means that by adding more data, more OOV problems are solved. In addition, more syntactical issues are paid attention to in 10m translation. In the second category possible to impossible, the degradation of translation is mainly caused by information loss. Many words are lost in the translations, which causes the uncomplete meaning of the source sentence as well as disordered sentence structure. In the third category possible to possible, most of the sentences remaining possible are simple sentences or frequently used phrases. However, once a sentence gets relatively more complex, the system cannot translate it. In the fourth category, impossible to impossible, there are quite a few grammatical structures found to be especially difficult to all the three translations, among which six issues are discussed: the translation of preposition phrase; the translation of rare word; translation of too colloquial wording; the translation of ambiguous words; the translation of se, a Finnish pronoun and the last one, word segmentation Qualitative analysis of out-of-domain translation In order to find out the difference between translations produced with in-domain corpus and translations produced with out-of-domain corpus, the scores of five translations are compared together, so that we know which translations are only improved in out-of-domain systems. According to the table 32, the sentences that are only better in out-of-domain sentences are taken out. 50

55 ST score-1m score-2m score-10mstt score-bk1m score-parl1m You'll have your own hound before too long You'll have your own hound before too long. sinulla on oma mikä lintukoira tuollainen ennen pitkää. sinulla on omia hurtta. et olisi oma koira, ennen kuin on liian kauan. sinulla on oma koira ennen pitkää. et ole oma jahdata ennen pitkää. All right. We'll take it All right. We'll take it. Selvä. me jatkamme. Selvä. me jatkamme. Selvä. tehdään se. Selvä. me jatkamme. No niin. otetaan se. I did what you said, I did what you said, sano3in, sanoin, sanoin, sanoin, tein, mitä sanoitte, table sentences that are only better in out-of-domain translations Only three sentences are found special in the two translations produced by out-of-domain corpus translation systems. Other translated sentences either have the same quality as, or worse quality than the others made by in-domain systems. Especially parl1m system, it is expected to be the best one among the five, or at least it has the same quality as 10m translation, as the corpus of parliament is well-aligned with refined vocabulary and sentence structures, containing nearly two million sentence pairs, while the corpus of book has only three thousand sentence pairs, with two novels in it. However, as already analysed and mentioned in 5.2, the evaluation results show that parl1m translation is not even as good as bk1m. it is far below the expectation. As a result, it turns out to be the worst one. The quantitative analysis of evaluation results shows that its quality is more or less the same as 1m translation. 51

56 From the qualitative perspective, even though Parl1m does not reach the expectation, it does have some merits compared to other translations as showing in the following two aspects: st 1m 2m 10m bk1m parl1m I was a little boy growing up deep in the Ozark mountains with my mom and dad and my two little sisters, minä olin pikkupoika kasvaa syvään sisään ja vuoret Ozark äitini ja isäni ja kaksi pientä sisaret. minä olin pikkupoika kasvaa, - syvällä pelätyn Ozark vuoret - ja kaksi pientä. olin pieni poika kasvaa syvällä vuorilla Ozark äidin ja isän kanssa ja kaksi pikku sisaret. minä olin pikkupoika - syvälle kasvaa Ozark vuoret äitini kanssa ja isäni ja kaksi pientä. minä olin pikkupoika, joka kasvaa syvälle vuorien Ozark vuoria ja äitini ja isäni ja kaksi pientä sisaret. and there was a legend in those parts that a little Indian boy and girl got lost in a blizzard and died. ja siellä oli legenda noissa sen vähän tavaroita, poika ja tyttö on kadonnut - ja kuoli. ja siinä oli legenda siihen, että siellä on pieni poika ja tyttö on eksynyt ja kuoli. ja siellä oli legenda, intialainen poika ja tyttö on menettänyt lumimyrskyss ä ja kuoli. ja siellä oli legenda niihin, että vähän intialainen poika ja tyttö on eksyväni ja kuoli. ja siellä oli legenda niillä alueilla, jotka hieman Intian poika ja tyttö katosi ja kuoli. We know that. Your uncle Fred is the one that sent us out here. emme tiedä sitä. teidän setä Fred on lähetti meidät tänne. emme tiedä sitä. teidän setä Fredin lähetti meidät tänne. että Teidän Fred setä lähetti meidät tänne. emme tiedä sitä. teidän setä Fred on ainoa, joka lähetti meidät tänne. tiedämme, että hän on. teidän kanssa, joka lähetti meidät tänne. table merits of parl1m in syntax It is relatively better at translating participle clauses or subordinate structures. In the first sentence I was a little boy growing., growing is a participle construction which is used when the participle and the verb in the main clause have the same subject. Finnish language does not have such a construction, so by translating it into a relative pronoun is not a bad idea. In sentence 2 and 3, only the systems of out-of-domain corpus read the hypotaxis of the source sentences. 52

57 st 1m 2m 10m bk1m parl1m That is, of course, if we have your permission. se on tietenkin, jos emme tarvitse lupaasi. se on tietenkin, jos meillä on lupaasi. se on tietenkin, jos meillä on lupa. se on tietysti, jos meillä on lupaasi. se on tietenkin, jos sallitte. Well, I guess it won't hurt nothing. No, kai se ole mitään. Kaipa se mitään. No, se ei satu mitään. No, se ei satuta mitään. No, se ei haittaa mitään. Well, that's a real shame. Sepä harmi. Sepä harmi. No, se on sääli. Sepä harmi. tämä on todellinen häpeä. table merits of parl1m in word choosing In word choosing, Parl1m is slightly different from the other translations. In the first sentence, the first four translations are translated word by word, which is not wrong here, while Parl1m uses the word sallitte (in English you allow ) to replace we have your permission which makes the sentence more natural, without a translation tone. The word hurt has been talked already in the above analysis. st 1m 2m 10m bk1m parl1m You done your schoolwork? Oletko tehnyt sinun koulukamoja. - Olet tehnyt koulukamoja. Oletko tehnyt läksysi? Oletko tehnyt sinun koulukamoja. Oletko tehnyt teidän koulukamoja. I want you to understand something, Billy. haluan, että ymmärrät jotain, Billy. haluan sinun ymmärtävän jotain, Billy. haluan sinun ymmärtävän jotain, Billy. haluan, että ymmärrät jotain, Billy. haluan teidän ymmärtävän, Billy. table translation of you Another interesting phenomenon is that most of the pronoun you are translated in a more formal, respectable way. This might be also due to the genre of the corpus, of which the language is more standard and normalised. 53

58 As can be seen, from the qualitative perspective, out-of-domain translations do not outperform the in-domain translations to a great extent as only three sentences are found to be better in out-of-domain translations, though they have a few merits in syntax and word choosing. 6.4 Analysis of scale-5 research As mentioned in the method chapter, a small research is conducted to support the results of evaluation. Each sentence is scored from a scale of 1 to 5 in order to eliminate the vagueness of being possible or impossible, and a statistical result is obtained as the following figure shows: figure 3----scores of five translastions in 5 scales Five colours represent score 1 to 5 respectively, from left to right. Generally, score 1 outperforms the others, with translation 1m, 2m and parl1m reaching the highest point, while 10m reaches the lowest point, meaning those three translations have the most lowscore translations, which is consistent with the previous study that those three have the most impossible translations. On the other hand, 10m has the greatest frequency in score 5, while parl1m has the lowest frequency, indicating that 10m translation is of the best quality while 54

Capacity Utilization

Capacity Utilization Capacity Utilization Tim Schöneberg 28th November Agenda Introduction Fixed and variable input ressources Technical capacity utilization Price based capacity utilization measure Long run and short run

Lisätiedot

Efficiency change over time

Efficiency change over time Efficiency change over time Heikki Tikanmäki Optimointiopin seminaari 14.11.2007 Contents Introduction (11.1) Window analysis (11.2) Example, application, analysis Malmquist index (11.3) Dealing with panel

Lisätiedot

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition) Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition) Esko Jalkanen Click here if your download doesn"t start automatically Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition) Esko Jalkanen

Lisätiedot

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) Juha Kahkonen Click here if your download doesn"t start automatically On instrument costs

Lisätiedot

Information on preparing Presentation

Information on preparing Presentation Information on preparing Presentation Seminar on big data management Lecturer: Spring 2017 20.1.2017 1 Agenda Hints and tips on giving a good presentation Watch two videos and discussion 22.1.2017 2 Goals

Lisätiedot

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) Juha Kahkonen Click here if your download doesn"t start automatically On instrument costs

Lisätiedot

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) Juha Kahkonen Click here if your download doesn"t start automatically On instrument costs

Lisätiedot

Network to Get Work. Tehtäviä opiskelijoille Assignments for students. www.laurea.fi

Network to Get Work. Tehtäviä opiskelijoille Assignments for students. www.laurea.fi Network to Get Work Tehtäviä opiskelijoille Assignments for students www.laurea.fi Ohje henkilöstölle Instructions for Staff Seuraavassa on esitetty joukko tehtäviä, joista voit valita opiskelijaryhmällesi

Lisätiedot

Other approaches to restrict multipliers

Other approaches to restrict multipliers Other approaches to restrict multipliers Heikki Tikanmäki Optimointiopin seminaari 10.10.2007 Contents Short revision (6.2) Another Assurance Region Model (6.3) Cone-Ratio Method (6.4) An Application of

Lisätiedot

The CCR Model and Production Correspondence

The CCR Model and Production Correspondence The CCR Model and Production Correspondence Tim Schöneberg The 19th of September Agenda Introduction Definitions Production Possiblity Set CCR Model and the Dual Problem Input excesses and output shortfalls

Lisätiedot

Gap-filling methods for CH 4 data

Gap-filling methods for CH 4 data Gap-filling methods for CH 4 data Sigrid Dengel University of Helsinki Outline - Ecosystems known for CH 4 emissions; - Why is gap-filling of CH 4 data not as easy and straight forward as CO 2 ; - Gap-filling

Lisätiedot

16. Allocation Models

16. Allocation Models 16. Allocation Models Juha Saloheimo 17.1.27 S steemianalsin Optimointiopin seminaari - Sks 27 Content Introduction Overall Efficienc with common prices and costs Cost Efficienc S steemianalsin Revenue

Lisätiedot

MEETING PEOPLE COMMUNICATIVE QUESTIONS

MEETING PEOPLE COMMUNICATIVE QUESTIONS Tiistilän koulu English Grades 7-9 Heikki Raevaara MEETING PEOPLE COMMUNICATIVE QUESTIONS Meeting People Hello! Hi! Good morning! Good afternoon! How do you do? Nice to meet you. / Pleased to meet you.

Lisätiedot

1. Liikkuvat määreet

1. Liikkuvat määreet 1. Liikkuvat määreet Väitelauseen perussanajärjestys: SPOTPA (subj. + pred. + obj. + tapa + paikka + aika) Suora sanajärjestys = subjekti on ennen predikaattia tekijä tekeminen Alasääntö 1: Liikkuvat määreet

Lisätiedot

Statistical design. Tuomas Selander

Statistical design. Tuomas Selander Statistical design Tuomas Selander 28.8.2014 Introduction Biostatistician Work area KYS-erva KYS, Jyväskylä, Joensuu, Mikkeli, Savonlinna Work tasks Statistical methods, selection and quiding Data analysis

Lisätiedot

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland Anne Mari Juppo, Nina Katajavuori University of Helsinki Faculty of Pharmacy 23.7.2012 1 Background Pedagogic research

Lisätiedot

AYYE 9/ HOUSING POLICY

AYYE 9/ HOUSING POLICY AYYE 9/12 2.10.2012 HOUSING POLICY Mission for AYY Housing? What do we want to achieve by renting apartments? 1) How many apartments do we need? 2) What kind of apartments do we need? 3) To whom do we

Lisätiedot

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu Returns to Scale II Contents Most Productive Scale Size Further Considerations Relaxation of the Convexity Condition Useful Reminder Theorem 5.5 A DMU found to be efficient with a CCR model will also be

Lisätiedot

anna minun kertoa let me tell you

anna minun kertoa let me tell you anna minun kertoa let me tell you anna minun kertoa I OSA 1. Anna minun kertoa sinulle mitä oli. Tiedän että osaan. Kykenen siihen. Teen nyt niin. Minulla on oikeus. Sanani voivat olla puutteellisia mutta

Lisätiedot

7.4 Variability management

7.4 Variability management 7.4 Variability management time... space software product-line should support variability in space (different products) support variability in time (maintenance, evolution) 1 Product variation Product

Lisätiedot

Integration of Finnish web services in WebLicht Presentation in Freudenstadt 2010-10-16 by Jussi Piitulainen

Integration of Finnish web services in WebLicht Presentation in Freudenstadt 2010-10-16 by Jussi Piitulainen Integration of Finnish web services in WebLicht Presentation in Freudenstadt 2010-10-16 by Jussi Piitulainen Who we are FIN-CLARIN University of Helsinki The Language Bank of Finland CSC - The Center for

Lisätiedot

Helsinki, Turku and WMT

Helsinki, Turku and WMT Helsinki, Turku and Uppsala @ WMT Jörg Tiedemann & Robert Östling, University of Helsinki Fabienne Cap & Sara Stymne, Uppsala University Filip Ginter & Jenna Kanerva, University of Turku Factored Phrase-Based

Lisätiedot

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL FinFamily PostgreSQL 1 Sisällys / Contents FinFamily PostgreSQL... 1 1. Asenna PostgreSQL tietokanta / Install PostgreSQL database... 3 1.1. PostgreSQL tietokannasta / About the PostgreSQL database...

Lisätiedot

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4)

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4) Tilasto T1106120-s2012palaute Kyselyn T1106120+T1106120-s2012palaute yhteenveto: vastauksia (4) Kysymys 1 Degree programme: (4) TIK: TIK 1 25% ************** INF: INF 0 0% EST: EST 0 0% TLT: TLT 0 0% BIO:

Lisätiedot

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) (www.childrens-books-bilingual.com) (Finnish Edition)

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) (www.childrens-books-bilingual.com) (Finnish Edition) Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) (www.childrens-books-bilingual.com) (Finnish Edition) Click here if your download doesn"t start automatically

Lisätiedot

The role of 3dr sector in rural -community based- tourism - potentials, challenges

The role of 3dr sector in rural -community based- tourism - potentials, challenges The role of 3dr sector in rural -community based- tourism - potentials, challenges Lappeenranta, 5th September 2014 Contents of the presentation 1. SEPRA what is it and why does it exist? 2. Experiences

Lisätiedot

Miksi Suomi on Suomi (Finnish Edition)

Miksi Suomi on Suomi (Finnish Edition) Miksi Suomi on Suomi (Finnish Edition) Tommi Uschanov Click here if your download doesn"t start automatically Miksi Suomi on Suomi (Finnish Edition) Tommi Uschanov Miksi Suomi on Suomi (Finnish Edition)

Lisätiedot

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward.

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward. START START SIT 1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward. This is a static exercise. SIT STAND 2. SIT STAND. The

Lisätiedot

Windows Phone. Module Descriptions. Opiframe Oy puh. +358 44 7220800 eero.huusko@opiframe.com. 02600 Espoo

Windows Phone. Module Descriptions. Opiframe Oy puh. +358 44 7220800 eero.huusko@opiframe.com. 02600 Espoo Windows Phone Module Descriptions Mikä on RekryKoulutus? Harvassa ovat ne työnantajat, jotka löytävät juuri heidän alansa hallitsevat ammatti-ihmiset valmiina. Fiksuinta on tunnustaa tosiasiat ja hankkia

Lisätiedot

Skene. Games Refueled. Muokkaa perustyyl. napsautt. @Games for Health, Kuopio. 2013 kari.korhonen@tekes.fi. www.tekes.fi/skene

Skene. Games Refueled. Muokkaa perustyyl. napsautt. @Games for Health, Kuopio. 2013 kari.korhonen@tekes.fi. www.tekes.fi/skene Skene Muokkaa perustyyl. Games Refueled napsautt. @Games for Health, Kuopio Muokkaa alaotsikon perustyyliä napsautt. 2013 kari.korhonen@tekes.fi www.tekes.fi/skene 10.9.201 3 Muokkaa Skene boosts perustyyl.

Lisätiedot

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007 National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007 Chapter 2.4 Jukka Räisä 1 WATER PIPES PLACEMENT 2.4.1 Regulation Water pipe and its

Lisätiedot

T Statistical Natural Language Processing Answers 6 Collocations Version 1.0

T Statistical Natural Language Processing Answers 6 Collocations Version 1.0 T-61.5020 Statistical Natural Language Processing Answers 6 Collocations Version 1.0 1. Let s start by calculating the results for pair valkoinen, talo manually: Frequency: Bigrams valkoinen, talo occurred

Lisätiedot

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo TIEKE Verkottaja Service Tools for electronic data interchange utilizers Heikki Laaksamo TIEKE Finnish Information Society Development Centre (TIEKE Tietoyhteiskunnan kehittämiskeskus ry) TIEKE is a neutral,

Lisätiedot

Oma sininen meresi (Finnish Edition)

Oma sininen meresi (Finnish Edition) Oma sininen meresi (Finnish Edition) Hannu Pirilä Click here if your download doesn"t start automatically Oma sininen meresi (Finnish Edition) Hannu Pirilä Oma sininen meresi (Finnish Edition) Hannu Pirilä

Lisätiedot

Innovative and responsible public procurement Urban Agenda kumppanuusryhmä. public-procurement

Innovative and responsible public procurement Urban Agenda kumppanuusryhmä.   public-procurement Innovative and responsible public procurement Urban Agenda kumppanuusryhmä https://ec.europa.eu/futurium/en/ public-procurement Julkiset hankinnat liittyvät moneen Konsortio Lähtökohdat ja tavoitteet Every

Lisätiedot

Results on the new polydrug use questions in the Finnish TDI data

Results on the new polydrug use questions in the Finnish TDI data Results on the new polydrug use questions in the Finnish TDI data Multi-drug use, polydrug use and problematic polydrug use Martta Forsell, Finnish Focal Point 28/09/2015 Martta Forsell 1 28/09/2015 Esityksen

Lisätiedot

Sisällysluettelo Table of contents

Sisällysluettelo Table of contents Sisällysluettelo Table of contents OTC:n Moodlen käyttöohje suomeksi... 1 Kirjautuminen Moodleen... 2 Ensimmäinen kirjautuminen Moodleen... 2 Salasanan vaihto... 2 Oma käyttäjäprofiili... 3 Työskentely

Lisätiedot

Alternative DEA Models

Alternative DEA Models Mat-2.4142 Alternative DEA Models 19.9.2007 Table of Contents Banker-Charnes-Cooper Model Additive Model Example Data Home assignment BCC Model (Banker-Charnes-Cooper) production frontiers spanned by convex

Lisätiedot

GOOD WORK LONGER CAREER:

GOOD WORK LONGER CAREER: Juhani Ilmarinen, Ville Ilmarinen, Pekka Huuhtanen, Veikko Louhevaara, Ove Näsman GOOD WORK LONGER CAREER: WORK WELL-BEING IN FINNISH TECHNOLOGY INDUSTRIES 2010-2015 Background Collective agreement between

Lisätiedot

Small Number Counts to 100. Story transcript: English and Blackfoot

Small Number Counts to 100. Story transcript: English and Blackfoot Small Number Counts to 100. Story transcript: English and Blackfoot Small Number is a 5 year-old boy who gets into a lot of mischief. He lives with his Grandma and Grandpa, who patiently put up with his

Lisätiedot

MUSEOT KULTTUURIPALVELUINA

MUSEOT KULTTUURIPALVELUINA Elina Arola MUSEOT KULTTUURIPALVELUINA Tutkimuskohteena Mikkelin museot Opinnäytetyö Kulttuuripalvelujen koulutusohjelma Marraskuu 2005 KUVAILULEHTI Opinnäytetyön päivämäärä 25.11.2005 Tekijä(t) Elina

Lisätiedot

Land-Use Model for the Helsinki Metropolitan Area

Land-Use Model for the Helsinki Metropolitan Area Land-Use Model for the Helsinki Metropolitan Area Paavo Moilanen Introduction & Background Metropolitan Area Council asked 2005: What is good land use for the transport systems plan? At first a literature

Lisätiedot

Bounds on non-surjective cellular automata

Bounds on non-surjective cellular automata Bounds on non-surjective cellular automata Jarkko Kari Pascal Vanier Thomas Zeume University of Turku LIF Marseille Universität Hannover 27 august 2009 J. Kari, P. Vanier, T. Zeume (UTU) Bounds on non-surjective

Lisätiedot

OP1. PreDP StudyPlan

OP1. PreDP StudyPlan OP1 PreDP StudyPlan PreDP The preparatory year classes are in accordance with the Finnish national curriculum, with the distinction that most of the compulsory courses are taught in English to familiarize

Lisätiedot

Choose Finland-Helsinki Valitse Finland-Helsinki

Choose Finland-Helsinki Valitse Finland-Helsinki Write down the Temporary Application ID. If you do not manage to complete the form you can continue where you stopped with this ID no. Muista Temporary Application ID. Jos et onnistu täyttää lomake loppuun

Lisätiedot

Julkaisun laji Opinnäytetyö. Sivumäärä 43

Julkaisun laji Opinnäytetyö. Sivumäärä 43 OPINNÄYTETYÖN KUVAILULEHTI Tekijä(t) SUKUNIMI, Etunimi ISOVIITA, Ilari LEHTONEN, Joni PELTOKANGAS, Johanna Työn nimi Julkaisun laji Opinnäytetyö Sivumäärä 43 Luottamuksellisuus ( ) saakka Päivämäärä 12.08.2010

Lisätiedot

Siirtymä maisteriohjelmiin tekniikan korkeakoulujen välillä Transfer to MSc programmes between engineering schools

Siirtymä maisteriohjelmiin tekniikan korkeakoulujen välillä Transfer to MSc programmes between engineering schools Siirtymä maisteriohjelmiin tekniikan korkeakoulujen välillä Transfer to MSc programmes between engineering schools Akateemisten asioiden komitea Academic Affairs Committee 11 October 2016 Eija Zitting

Lisätiedot

THE TEHDESSÄ CONSTRUCTION OF FINNISH AND THE TYPICALITY OF ADVANCED LEARNER LANGUAGE IN THE LIGHT OF NATIVE USERS' GRAMMATICALITY JUDGEMENTS

THE TEHDESSÄ CONSTRUCTION OF FINNISH AND THE TYPICALITY OF ADVANCED LEARNER LANGUAGE IN THE LIGHT OF NATIVE USERS' GRAMMATICALITY JUDGEMENTS THE TEHDESSÄ CONSTRUCTION OF FINNISH AND THE TYPICALITY OF ADVANCED LEARNER LANGUAGE IN THE LIGHT OF NATIVE USERS' GRAMMATICALITY JUDGEMENTS Kirsti Siitonen & Ilmari Ivaska FUXII Oulu TEHDESSÄ AND TEHTÄESSÄ

Lisätiedot

Uusi Ajatus Löytyy Luonnosta 3 (Finnish Edition)

Uusi Ajatus Löytyy Luonnosta 3 (Finnish Edition) Uusi Ajatus Löytyy Luonnosta 3 (Finnish Edition) Esko Jalkanen Click here if your download doesn"t start automatically Uusi Ajatus Löytyy Luonnosta 3 (Finnish Edition) Esko Jalkanen Uusi Ajatus Löytyy

Lisätiedot

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine 4.1.2018 Centre for Language and Communication Studies Puhutko suomea? -Hei! -Hei hei! -Moi! -Moi moi! -Terve! -Terve

Lisätiedot

Guidebook for Multicultural TUT Users

Guidebook for Multicultural TUT Users 1 Guidebook for Multicultural TUT Users WORKPLACE PIRKANMAA-hankkeen KESKUSTELUTILAISUUS 16.12.2010 Hyvää käytäntöä kehittämässä - vuorovaikutusopas kansainvälisille opiskelijoille TTY Teknis-taloudellinen

Lisätiedot

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ https://community.plm.automation.siemens.com/t5/tech-tips- Knowledge-Base-NX/How-to-simulate-any-G-code-file-in-NX- CAM/ta-p/3340 Koneistusympäristön määrittely

Lisätiedot

Basic Flute Technique

Basic Flute Technique Herbert Lindholm Basic Flute Technique Peruskuviot huilulle op. 26 Helin & Sons, Helsinki Basic Flute Technique Foreword This book has the same goal as a teacher should have; to make himself unnecessary.

Lisätiedot

LYTH-CONS CONSISTENCY TRANSMITTER

LYTH-CONS CONSISTENCY TRANSMITTER LYTH-CONS CONSISTENCY TRANSMITTER LYTH-INSTRUMENT OY has generate new consistency transmitter with blade-system to meet high technical requirements in Pulp&Paper industries. Insurmountable advantages are

Lisätiedot

Teacher's Professional Role in the Finnish Education System Katriina Maaranen Ph.D. Faculty of Educational Sciences University of Helsinki, Finland

Teacher's Professional Role in the Finnish Education System Katriina Maaranen Ph.D. Faculty of Educational Sciences University of Helsinki, Finland Teacher's Professional Role in the Finnish Education System Katriina Maaranen Ph.D. Faculty of Educational Sciences University of Helsinki, Finland www.helsinki.fi/yliopisto This presentation - Background

Lisätiedot

Voice Over LTE (VoLTE) By Miikka Poikselkä;Harri Holma;Jukka Hongisto

Voice Over LTE (VoLTE) By Miikka Poikselkä;Harri Holma;Jukka Hongisto Voice Over LTE (VoLTE) By Miikka Poikselkä;Harri Holma;Jukka Hongisto If you are searched for a book by Miikka Poikselkä;Harri Holma;Jukka Hongisto Voice over LTE (VoLTE) in pdf form, then you have come

Lisätiedot

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana Taustaa KAO mukana FINECVET-hankeessa, jossa pilotoimme ECVETiä

Lisätiedot

7. Product-line architectures

7. Product-line architectures 7. Product-line architectures 7.1 Introduction 7.2 Product-line basics 7.3 Layered style for product-lines 7.4 Variability management 7.5 Benefits and problems with product-lines 1 Short history of software

Lisätiedot

toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous

toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous Tuula Sutela toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous äidinkieli ja kirjallisuus, modersmål och litteratur, kemia, maantiede, matematiikka, englanti käsikirjoitukset vuoden

Lisätiedot

Information on Finnish Language Courses Spring Semester 2017 Jenni Laine

Information on Finnish Language Courses Spring Semester 2017 Jenni Laine Information on Finnish Language Courses Spring Semester 2017 Jenni Laine 4.1.2017 KIELIKESKUS LANGUAGE CENTRE Puhutko suomea? Do you speak Finnish? -Hei! -Moi! -Mitä kuuluu? -Kiitos, hyvää. -Entä sinulle?

Lisätiedot

Microsoft Lync 2010 Attendee

Microsoft Lync 2010 Attendee VYVI MEETING Lync Attendee 2010 Instruction 1 (15) Microsoft Lync 2010 Attendee Online meeting VYVI MEETING Lync Attendee 2010 Instruction 2 (15) Index 1 Microsoft LYNC 2010 Attendee... 3 2 Acquiring Lync

Lisätiedot

Characterization of clay using x-ray and neutron scattering at the University of Helsinki and ILL

Characterization of clay using x-ray and neutron scattering at the University of Helsinki and ILL Characterization of clay using x-ray and neutron scattering at the University of Helsinki and ILL Ville Liljeström, Micha Matusewicz, Kari Pirkkalainen, Jussi-Petteri Suuronen and Ritva Serimaa 13.3.2012

Lisätiedot

RANTALA SARI: Sairaanhoitajan eettisten ohjeiden tunnettavuus ja niiden käyttö hoitotyön tukena sisätautien vuodeosastolla

RANTALA SARI: Sairaanhoitajan eettisten ohjeiden tunnettavuus ja niiden käyttö hoitotyön tukena sisätautien vuodeosastolla TURUN YLIOPISTO Hoitotieteen laitos RANTALA SARI: Sairaanhoitajan eettisten ohjeiden tunnettavuus ja niiden käyttö hoitotyön tukena sisätautien vuodeosastolla Pro gradu -tutkielma, 34 sivua, 10 liitesivua

Lisätiedot

Curriculum. Gym card

Curriculum. Gym card A new school year Curriculum Fast Track Final Grading Gym card TET A new school year Work Ethic Detention Own work Organisation and independence Wilma TMU Support Services Well-Being CURRICULUM FAST TRACK

Lisätiedot

Use of spatial data in the new production environment and in a data warehouse

Use of spatial data in the new production environment and in a data warehouse Use of spatial data in the new production environment and in a data warehouse Nordic Forum for Geostatistics 2007 Session 3, GI infrastructure and use of spatial database Statistics Finland, Population

Lisätiedot

EVALUATION FOR THE ERASMUS+-PROJECT, STUDENTSE

EVALUATION FOR THE ERASMUS+-PROJECT, STUDENTSE #1 Aloitettu: 6. marraskuuta 2015 9:03:38 Muokattu viimeksi: 6. marraskuuta 2015 9:05:26 Käytetty aika: 00:01:47 IP-osoite: 83.245.241.86 K1: Nationality Finnish K2: The program of the week has been very

Lisätiedot

BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT

BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT UNCITRAL EMERGENCE CONFERENCE 13.12.2016 Session I: Emerging Legal Issues in the Commercial Exploitation of Deep Seabed, Space and AI BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT

Lisätiedot

1. Gender - Sukupuoli N = 65. 2. Age - Ikä N = 65. Female Nainen. Male Mies 20-24 25-29 30-34 35-39 40-44 45-49 50-

1. Gender - Sukupuoli N = 65. 2. Age - Ikä N = 65. Female Nainen. Male Mies 20-24 25-29 30-34 35-39 40-44 45-49 50- Aalto Doctoral Programme in Science, Follow-up Questionnaire for Doctoral Students - Perustieteiden tohtoriohjelma, seurantakysely jatko-opiskelijoille (22 % answered to the questionnaire) 1. Gender -

Lisätiedot

Information on Finnish Courses Autumn Semester 2017 Jenni Laine & Päivi Paukku Centre for Language and Communication Studies

Information on Finnish Courses Autumn Semester 2017 Jenni Laine & Päivi Paukku Centre for Language and Communication Studies Information on Finnish Courses Autumn Semester 2017 Jenni Laine & Päivi Paukku 24.8.2017 Centre for Language and Communication Studies Puhutko suomea? -Hei! -Hei hei! -Moi! -Moi moi! -Terve! -Terve terve!

Lisätiedot

EUROOPAN PARLAMENTTI

EUROOPAN PARLAMENTTI EUROOPAN PARLAMENTTI 2004 2009 Kansalaisvapauksien sekä oikeus- ja sisäasioiden valiokunta 2008/0101(CNS) 2.9.2008 TARKISTUKSET 9-12 Mietintöluonnos Luca Romagnoli (PE409.790v01-00) ehdotuksesta neuvoston

Lisätiedot

The Viking Battle - Part Version: Finnish

The Viking Battle - Part Version: Finnish The Viking Battle - Part 1 015 Version: Finnish Tehtävä 1 Olkoon kokonaisluku, ja olkoon A n joukko A n = { n k k Z, 0 k < n}. Selvitä suurin kokonaisluku M n, jota ei voi kirjoittaa yhden tai useamman

Lisätiedot

ReFuel 70 % Emission Reduction Using Renewable High Cetane Number Paraffinic Diesel Fuel. Kalle Lehto, Aalto-yliopisto 5.5.

ReFuel 70 % Emission Reduction Using Renewable High Cetane Number Paraffinic Diesel Fuel. Kalle Lehto, Aalto-yliopisto 5.5. ReFuel 70 % Emission Reduction Using Renewable High Cetane Number Paraffinic Diesel Fuel Kalle Lehto, Aalto-yliopisto 5.5.2011 Otaniemi ReFuel a three year research project (2009-2011) goal utilize the

Lisätiedot

Vaihtoon lähdön motiivit ja esteet Pohjoismaissa. Siru Korkala 12.10.2012

Vaihtoon lähdön motiivit ja esteet Pohjoismaissa. Siru Korkala 12.10.2012 Vaihtoon lähdön motiivit ja esteet Pohjoismaissa Siru Korkala 12.10.2012 Tutkimuskysymykset Miten kansainväliseen liikkuvuuteen osallistuvat opiskelijat eroavat ei-liikkujista taustoiltaan Mitkä ovat liikkuvuuden

Lisätiedot

Supply Chain Management and Material Handling in Paper Industry Case Tervakoski Oy

Supply Chain Management and Material Handling in Paper Industry Case Tervakoski Oy Tampere University of Applied Sciences Paper technology International Pulp and Paper Technology Supply Chain Management and Material Handling in Paper Industry Case Tervakoski Oy Supervisor Commissioned

Lisätiedot

General studies: Art and theory studies and language studies

General studies: Art and theory studies and language studies General studies: Art and theory studies and language studies Centre for General Studies (YOYO) Aalto University School of Arts, Design and Architecture ARTS General Studies General Studies are offered

Lisätiedot

Immigration Studying. Studying - University. Stating that you want to enroll. Stating that you want to apply for a course.

Immigration Studying. Studying - University. Stating that you want to enroll. Stating that you want to apply for a course. - University I would like to enroll at a university. Stating that you want to enroll I want to apply for course. Stating that you want to apply for a course an undergraduate a postgraduate a PhD a full-time

Lisätiedot

make and make and make ThinkMath 2017

make and make and make ThinkMath 2017 Adding quantities Lukumäärienup yhdistäminen. Laske yhteensä?. Countkuinka howmonta manypalloja ballson there are altogether. and ja make and make and ja make on and ja make ThinkMath 7 on ja on on Vaihdannaisuus

Lisätiedot

BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET.

BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET. BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET. Pekka Ollikainen Open Source Microsoft CodePlex bio Verkkosivustovastaava Suomen Sarjakuvaseura

Lisätiedot

Travel Getting Around

Travel Getting Around - Location Olen eksyksissä. Not knowing where you are Voisitko näyttää kartalta missä sen on? Asking for a specific location on a map Mistä täällä on? Asking for a specific...wc?...pankki / rahanvaihtopiste?...hotelli?...huoltoasema?...sairaala?...apteekki?...tavaratalo?...ruokakauppa?...bussipysäkki?

Lisätiedot

Tarua vai totta: sähkön vähittäismarkkina ei toimi? 11.2.2015 Satu Viljainen Professori, sähkömarkkinat

Tarua vai totta: sähkön vähittäismarkkina ei toimi? 11.2.2015 Satu Viljainen Professori, sähkömarkkinat Tarua vai totta: sähkön vähittäismarkkina ei toimi? 11.2.2015 Satu Viljainen Professori, sähkömarkkinat Esityksen sisältö: 1. EU:n energiapolitiikka on se, joka ei toimi 2. Mihin perustuu väite, etteivät

Lisätiedot

Hankkeiden vaikuttavuus: Työkaluja hankesuunnittelun tueksi

Hankkeiden vaikuttavuus: Työkaluja hankesuunnittelun tueksi Ideasta projektiksi - kumppanuushankkeen suunnittelun lähtökohdat Hankkeiden vaikuttavuus: Työkaluja hankesuunnittelun tueksi Erasmus+ -ohjelman hakuneuvonta ammatillisen koulutuksen kumppanuushanketta

Lisätiedot

FinFamily Installation and importing data (11.1.2016) FinFamily Asennus / Installation

FinFamily Installation and importing data (11.1.2016) FinFamily Asennus / Installation FinFamily Asennus / Installation 1 Sisällys / Contents FinFamily Asennus / Installation... 1 1. Asennus ja tietojen tuonti / Installation and importing data... 4 1.1. Asenna Java / Install Java... 4 1.2.

Lisätiedot

Pojan Sydan: Loytoretki Isan Rakkauteen (Finnish Edition)

Pojan Sydan: Loytoretki Isan Rakkauteen (Finnish Edition) Pojan Sydan: Loytoretki Isan Rakkauteen (Finnish Edition) M. James Jordan Click here if your download doesn"t start automatically Pojan Sydan: Loytoretki Isan Rakkauteen (Finnish Edition) M. James Jordan

Lisätiedot

NAO- ja ENO-osaamisohjelmien loppuunsaattaminen ajatuksia ja visioita

NAO- ja ENO-osaamisohjelmien loppuunsaattaminen ajatuksia ja visioita NAO- ja ENO-osaamisohjelmien loppuunsaattaminen ajatuksia ja visioita NAO-ENO työseminaari VI Tampere 3.-4.6.2015 Projektisuunnittelija Erno Hyvönen erno.hyvonen@minedu.fi Aikuiskoulutuksen paradigman

Lisätiedot

Collaborative & Co-Creative Design in the Semogen -projects

Collaborative & Co-Creative Design in the Semogen -projects 1 Collaborative & Co-Creative Design in the Semogen -projects Pekka Ranta Project Manager -research group, Intelligent Information Systems Laboratory 2 Semogen -project Supporting design of a machine system

Lisätiedot

WITNESS SUPPORT THE FINNISH EXPERIENCE

WITNESS SUPPORT THE FINNISH EXPERIENCE WITNESS SUPPORT THE FINNISH EXPERIENCE T i i n a R a n t a n e n R e g i o n a l M a n a g e r, V i c t i m S u p p o r t F i n l a n d 17.6.2013 1 VS FINLAND S SERVICES Help line (nation wide) Mon - Tue

Lisätiedot

HITSAUKSEN TUOTTAVUUSRATKAISUT

HITSAUKSEN TUOTTAVUUSRATKAISUT Kemppi ARC YOU GET WHAT YOU MEASURE OR BE CAREFUL WHAT YOU WISH FOR HITSAUKSEN TUOTTAVUUSRATKAISUT Puolitetaan hitsauskustannukset seminaari 9.4.2008 Mikko Veikkolainen, Ratkaisuliiketoimintapäällikkö

Lisätiedot

OFFICE 365 OPISKELIJOILLE

OFFICE 365 OPISKELIJOILLE OFFICE 365 OPISKELIJOILLE Table of Contents Articles... 3 Ohjeet Office 365 käyttöönottoon... 4 One Driveen tallennetun videon palauttaminen oppimisympäristön palautuskansioon... 5 Changing default language

Lisätiedot

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä OULUN YLIOPISTO Tietojenkäsittelytieteiden laitos Johdatus ohjelmointiin 81122P (4 ov.) 30.5.2005 Ohjelmointikieli on Java. Tentissä saa olla materiaali mukana. Tenttitulokset julkaistaan aikaisintaan

Lisätiedot

Salasanan vaihto uuteen / How to change password

Salasanan vaihto uuteen / How to change password Salasanan vaihto uuteen / How to change password Sisällys Salasanakäytäntö / Password policy... 2 Salasanan vaihto verkkosivulla / Change password on website... 3 Salasanan vaihto matkapuhelimella / Change

Lisätiedot

Valuation of Asian Quanto- Basket Options

Valuation of Asian Quanto- Basket Options Valuation of Asian Quanto- Basket Options (Final Presentation) 21.11.2011 Thesis Instructor and Supervisor: Prof. Ahti Salo Työn saa tallentaa ja julkistaa Aalto-yliopiston avoimilla verkkosivuilla. Muilta

Lisätiedot

HARJOITUS- PAKETTI A

HARJOITUS- PAKETTI A Logistiikka A35A00310 Tuotantotalouden perusteet HARJOITUS- PAKETTI A (6 pistettä) TUTA 19 Luento 3.Ennustaminen County General 1 piste The number of heart surgeries performed at County General Hospital

Lisätiedot

Transport climate policy choices in the Helsinki Metropolitan Area 2025

Transport climate policy choices in the Helsinki Metropolitan Area 2025 Transport climate policy choices in the Helsinki Metropolitan Area 2025 views of transport officials and politicians Vilja Varho Introduction Experts have doubts about whether sufficiently effective policies

Lisätiedot

Hotel Sapiens (Finnish Edition)

Hotel Sapiens (Finnish Edition) Hotel Sapiens (Finnish Edition) Leena Krohn Click here if your download doesn"t start automatically Hotel Sapiens (Finnish Edition) Leena Krohn Hotel Sapiens (Finnish Edition) Leena Krohn Leena Krohnin

Lisätiedot

HUMAN RESOURCE DEVELOPMENT PROJECT AT THE UNIVERSITY OF NAMIBIA LIBRARY

HUMAN RESOURCE DEVELOPMENT PROJECT AT THE UNIVERSITY OF NAMIBIA LIBRARY HUMAN RESOURCE DEVELOPMENT PROJECT AT THE UNIVERSITY OF NAMIBIA LIBRARY Kaisa Sinikara, University Librarian, Professor and Elise Pirttiniemi, Project Manager, Helsinki University Library Ellen Namhila,

Lisätiedot

Lab SBS3.FARM_Hyper-V - Navigating a SharePoint site

Lab SBS3.FARM_Hyper-V - Navigating a SharePoint site Lab SBS3.FARM_Hyper-V - Navigating a SharePoint site Note! Before starting download and install a fresh version of OfficeProfessionalPlus_x64_en-us. The instructions are in the beginning of the exercise.

Lisätiedot

A new model of regional development work in habilitation of children - Good habilitation in functional networks

A new model of regional development work in habilitation of children - Good habilitation in functional networks A new model of regional development work in habilitation of children - Good habilitation in functional networks Salla Sipari, PhD, Principal Lecturer Helena Launiainen, M.Ed, Manager Helsinki Metropolia

Lisätiedot

Co-Design Yhteissuunnittelu

Co-Design Yhteissuunnittelu Co-Design Yhteissuunnittelu Tuuli Mattelmäki DA, associate professor Aalto University School of Arts, Design and Architecture School of Arts, Design and Architecture design with and for people Codesign

Lisätiedot

Elämä on enemmän kuin yksi ilta (Finnish Edition)

Elämä on enemmän kuin yksi ilta (Finnish Edition) Elämä on enemmän kuin yksi ilta (Finnish Edition) Maria Calabria Click here if your download doesn"t start automatically Elämä on enemmän kuin yksi ilta (Finnish Edition) Maria Calabria Elämä on enemmän

Lisätiedot