81:001.103](480).. ( ) )*,,.. 40,. ( ),.,, 60 [1, 2].,, ([3, 4, 5, 6]:. "Korpuslingvistiikan työpaja 1: Korpukset ja ohjelmat" [7,. 126 134])., [8; 1,. 50 54, 62 64].,,.,. ( ),., ( Unix, Unix,, "F Secure SSH Client").,,...,,,...,...,. ISSN 0548 0027... 2... 2003. 6 37
" ".,,., " " (, "a running text"),. "So a corpus in modern linguistics, in contrast to being simply any body of text, might more accurately be described as a finite sized body of machine readable text, sampled in order to be maximally representative of the language variety under consideration" [9, c. 24].,, " " (.. )..,,, (,,..).,. ( ).,,,,.,,,. FINNISH IT CENTER FOR SCIENCE (CSC) http://www.csc.fi CSC,,,., 2. (,,.),, CSC,,., CSC 60,. CSC "Kielipankki" " " (http://www.csc.fi/kielipankki),,,.. 747 236, 210539150. 86% ( 179602389 ). (, CSC, [10]). Suomen kielen tekstipankki (SKTP) ) http://www.csc.fi/sktp 2001. SKTP 180 ( 35 2000.). SGML TEI :,, (1990 2000.)., TEXTMORFO SWESG. http: //www.csc.fi/sktp/sktp tekstit.html. http://www.csc.fi/ kielipankki/aineistot/naytaaineistot.phtml SKTP LT ("Suomen kielen tekstipankin leksikaalinen tietokanta")., (,,,,, ( ), ). LEMMIE (. [11]). Oulun korpus ( ) [12] http://www.csc.fi/kielipankki/julkaisut/opas/c623. phtml#aen625 80 XX 2 Cedar,. 128, 76.8, 160. (http://www.csc.fi/metacomputer/laitetiedot.html.en). 38 ISSN 0548 0027... 2... 2003. 6
429058 5800,,,,. 1997 SGML CSC. CQP (Corpus Query Processor). Suomen murteiden morfologinen digitaaliarkisto ) http://www.joensuu.fi/fld/methodsxi/abstracts/ kukkola.html 1967. 2., 1000. SGML... Keskiranskan korpus ) http://www.csc.fi/kielipankki/julkaisut/opas/ x693.phtml, XIV XVI. 29 1.. CSC 14 (430. ). :,,.. LE PAROLE http://www.hltcentral.org/usr docs/prqject source/ parole/parolefinal.pdf,, 11. 20.,. PAROLE FI,, CSC (PAROLE GE).,, CSC,.. Susanne corpus (http://www.cogs.susx.ac. uk/users/geoffs/suedoc.html). CHILDES (http://childes.psy.cmu.edu/). The Oxford Text Archive (http://ota.ahds. ac.uk/) ISSN 0548 0027... 2.. Le Monde (http://www.icp.inpg.fr/elra/) WordNet (http://www.cogsci.princeton.edu/ The university of Helsinki language corpus server (UHLCS) http://www.ling.helsinki.fi/uhlcs.,. 1980.,., HKV (. [13]). 80, ("Databank for endangered Finno Ugric languages", "LENCA group project", "A Finland Swedish Corpus (FISC)".),,. " " (Two level Grammar;. [14]) " " (Constraint Grammar;. [15]). http://www.lingsoft.fi/doc/).,,,.. (,, ),. " ", ( ) (,,,,, ). UHLCS 50 (,, http; //www.hng.helsinki.fi/uhlcs/data/languages.html). 2000.., (. http://www.ling.helsmki.fi/uhlcs/tools/tools.html)., UHLCS, : Helsinki Corpora I Helsinki Corpora II. Helsinki Corpora I http://www.ling.helsinki.fi/uhlcs/data/helsinkicorpora I.html (, ); (,. 2003. 39
); ( );,,,,. Helsinki Corpora II http://www.ling.helsinki.fi/uhlcs/data/helsinkicorpora II.html,. (,, (, )),,, )., "Databank for endangered Finno Ugric languages" (, ;. [16]).,. UHLCS, (http: // www.ling.helsinki.fi / uhlcs/readme all / README Russian.html). SGML. 1999 2000.,.., RUSTWOL [17]., Lingsoft. 200.. ). SGML... ). 3. ( ", " ", " "). 100.. Ryscard,.. 400..., 1,5. http://www.kotus.fi/inenglish/ (Kotimaisten kielten tutkimuskeskus, KOTUS).,,,.,,.., 76. CSC (. ),,. [18],,,,. 1987., ( SGML) 100.. ( 2100 000, ) ( 50 80. ). XIX, 1810 1900. ( http://www.kotus.fi/aineistot/18qo/180qsahkoisetaineistot.shtml). XVI XVIII 3,2 ( http://www.kotus.fi/ ameistot/vks sahkoinenaineisto.shtml).., [19]., ( 1100 000, 200. ).. (http: //www.utu.fi/hum/suomi/kokoeima.htm#la) 250. 40, 10% [20]. KWIC, AGREP. http://www.eng.helsmki.fi/mdex.htm. 40 ISSN 0548 0027... 2... 2003.
,,. The Helsinki Corpus of English Texts: Diachronic Part [21],,,. ( 750.) XVIII (1710.). 1984..., 25 " " (,,,,,.)., ( ) [22]. The Helsinki Corpus of English Texts: the Dialect Corpus (HD) [23], 1970 80. [24]. 800.,.,,,. The Corpus of Early English Correspondence (CEEC) [25, 26, 27] 1993 2000. 2.7, 1417 1681. XVII XVIII., "Sociolinguistics and Language History", (,,..).. The Corpus of Early English Medical Writing [28] 1375 1750. 1.5.... The Helsinki Corpus of Older Scots (HCOS) [29] 1450 1700., ISSN 0548 0027... 2... 2003., :,,,,,,. 830... ) [30] http://www.utu.fi/hum/sgr/volgapalvengl. htm (,, ), (,, )..,. MORMULA ( ) 200.,. MARKO ( ) 1.. ONCHYKO " " 1996 1999. 16, ( XIX ). The Neo Assyrian Text Corpus Project http://www.helsinki.fi/science/saa/cna.html (. ) " " ( G, State Archives of Assyria).,,, ANSII 3358547. The Corpus of 19th century English (CONCE) [31].. ( ). (1800 1830,1830 1870, 1870 1900), " ":,,,,,. 1 41
.. Suomen Kansan Vanhat Runot (SKVR) http://www.oszk.hu/ujdonsag/firm/eng/klemettinen. html. "Suomen kansan vanhat runot, SKVR" (" "). ). XML,. 150.,,. The Electronic Corpus of Ingrian Finnish http://helmer.hit.uib.no/ingrisk/ingrian.html,. [32].. Corpus Cyrillo Methodianum Helsingiense http://www.slav.helsinki.fi/ccmh/.. (,,.).,,.. The Tampere Bilingual Corpus of Finnish and English [33]. "Sinuhe egyptiläinen"., ( )... PARRU S [34],.,.,,. 5.,,.. http://www.slav.helsinki.fi/hanko... ).,. 100..,,. 1. Kieliteknologia Suomessa. / Ed. by M. Miettinen. Helsinki, 1998. 2. The History of Linguistics in the Nordic Countries. / Ed. by C. Henriksen, E. Hovdhaugen, F. Karlsson, B. Siguurd. Helsinki, 2000. 3. Koskenniemi K. Tietokonelingvistiikan vaihteet Suomessa. Tietoyhteys 1997, 3, 7 8. 4. Miettinen M. Kieliteknologia tarvitsee Kielipankkia. Tietoyhteys 1997, 3, 5 6. 5. Lehtinen M. Tietokoneet sanakirjatyössä. Tieto yhteys 1997, 3, 9 10. 6. Lehtinen M., Karvonen P., Rahikäinen T. Raportti tekstikorpusten koostamisperiaatteista ja ny kysuomen tekstiaineistojen tarpeellisuudesta Kotimaisten kielten tutkimuskeskuksessa. Helsinki, 1995. 7. XXIX Kielitieteen päivät. Helsinki, 2002. 8. Arppe A. No Single Path Finnish Lessons in the Commercialisation of language engineering Research. 2002 [ : http://www.csc.fi/ euromap / artikkelit / arppe.phtml.en]. 9. McEnery T. and Wilson A. Corpus Linguistics. Edinburgh, 1996. 10. Miettinen M. Kielipankin asiakkaan opas, Espoo, 2000. [ : http://www.csc.fi/ kielipankki/julkaisut/opas/index.phtml]. 11. Grönroos M. The Lexical Database of the Bank of Finnish. Helsinki, 2000. 12. Oulun korpus: 1960 luvun suomen yleiskielen tut kimusmateriaali / ed. by P. Saukkonen, Oulu, 1982. 13. Hakulinen A., Karlsson F., Vilkuna M. Suomen tekstilauseiden piirteitä, kvantitatiivinen tutki mus. Helsinki, 1980. [=Publications 6. Department of General Linguistics]. 14. Koskenniemi K. Two Level Morphology, A General Computational Model for Word Form Recogni tion and Production. Helsinki, 1983. [^Publications 11. Department of General Linguistics]. 15. Constraint Grammar: A Language Independent Framework for Parsing Unrestricted Tex. / Ed. by F. Karlsson, A. Voutilainen, J. Heikkilä, A. Anttila. Berlin: Mouton de Gruyter, 1995. 16. Suihkonen P. Documentation of the Computer Corpora of the Uralic Languages at the University of Helsinki. Helsinki. Department of General Linguistics. [Technical Reports TR 2], 1998. 17. Viikki L. RUSTWOL, A System for Automatic Recognition of Russian words, 1997. [ : www.lingsoft.fi/doc/rustwol/rustwol.txt]. 42 ISSN 0548 0027.. CEP. 2... 2003.
18. Suomen kielen perussanakirja. Helsinki, 1990 1994, I III. 19. Suomen murteiden sanakirja. Helsinki, 1985. 20. Miikkulainen R. The Database of Finnish Toponyms, Proceedings of the XIXth International Congress of Onomastic Sciences (Aberdeen, August 4 11, 1996), 2. P. 248 255. Aberdeen, 1998. [ : http://www.kotus.fi/aineistot/ nirniarkisto/paikannirnet/toponyins.shtrnl]. 21. Manual to the Diachronic Part of the Helsinki Corpus of English Texts. Coding Conventions and Lists of Source Texts. / Kytö M. (.), 3rd ed. Helsinki, 1996. [Available at http://khnt.hit.uib.no/ icame/manuals/hc/index.htm]. 22. Kytö M. and Rissanen M. English Historical Corpora, Report on Developments in 1999, ICAME JOURNAL, Computers in English Linguistics 2000, P. 24, 159 175. 23. P ei t sara K. and Vasko A. The Helsinki Dialect Corpus, Characteristics of Speech and Aspects of Variation. Helsinki English Studies, The Electronic Journal of the Department of English at the University of Helsinki 2002, 2 [ : http://www.eng.helsinki.fi/hes/corpora/ helsinki dialect corpus.htm]. 24. Orton H. Survey of English Dialects. Leeds, 1962. 25. Nevalainen T. & Raumolin Brunberg H. Sociolinguistics and language history, The Helsinki Corpus of Early English Correspondence, Hermes: Journal of Linguistics 1994, 13. P. 135 143. 26. Manual for the Corpus of Early English Corres pondence Sampler CEECS / ed. by Nurmi A. Helsinki, 1998. [ : http://www.hit.uib.no/ icame/ceecs/index, htm]. 27. Laitinen M. Extending the Corpus of Early English Correspondence to the 18th Century, Helsinki English Studies, The Electronic Journal of the Department of English at the University of Helsinki 2002, 2 : http://www.eng.helsinki.fi/ hes/corpora/extending the_ corpus.htm]. 28. Taavitsainen I. and Pahta P. Corpus of Early English Medical Writing 1375 1750, ICAME JOURNAL, Computers in English Linguistics 1997, 21. P. 71 79. 29. Meurman Solin A. A New Tool, the Helsinki Corpus of Older Scots (1450 1700), ICAME JOURNAL, Computers in English Linguistics 1995, 19. P. 49 63. 30. Moisio A. and Luutonen J. Turun yliopiston volgakielten korpukset, XXIII Kielitieteen päivät, 123. Helsinki, 1996. 31. Kytö M., Rudanko J., Smitterberg E. Building a Bridge between the Present and the Past, a Corpus of 19th century English, ICAME JOURNAL, Computers in English Linguistics 2000, 24. P. 85 97. 32. Cooper W. R. The Tampere Bilingual Corpus of Finnish and English, Development and Applications, Compare or Contrast? Current Issues in Crosslanguage Research. Tampere, 1998. 33. Savijärvi I.: Western Ingria Where Languages and Dialects Meet. [ : http; //helmer.hit.uib.no/ingrisk/western.html]. 34. Mikhailov M. and Tommola H. Compiling Parallel Text Corpora, towards Automation of Routine Procedures, International Journal of Corpus Linguistics 2001, 6. P. 69 77. ISSN 0548 0027.. CEP. 2... 2003. 43