MS-A0504 First course in probability and statistics

Samankaltaiset tiedostot
MS-A0502 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

MS-A0501 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

MS-A0501 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

Capacity Utilization

Efficiency change over time

The CCR Model and Production Correspondence

E80. Data Uncertainty, Data Fitting, Error Propagation. Jan. 23, 2014 Jon Roberts. Experimental Engineering

UEF Statistics Teaching Bulletin, Fall 2017

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

MS-A0502 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

Gap-filling methods for CH 4 data

Other approaches to restrict multipliers

Alternative DEA Models

T Statistical Natural Language Processing Answers 6 Collocations Version 1.0

Information on preparing Presentation

16. Allocation Models

MS-A0501 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

Valuation of Asian Quanto- Basket Options

MS-A0503 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

Statistical design. Tuomas Selander

Information on Finnish Language Courses Spring Semester 2017 Jenni Laine

UEF Statistics Teaching Bulletin, Spring 2018

Bounds on non-surjective cellular automata

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

MS-A0501 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

LYTH-CONS CONSISTENCY TRANSMITTER

MS-C2128 Ennustaminen ja Aikasarja-analyysi, 5 op Esittely

MS-C2128 Ennustaminen ja Aikasarja-analyysi, 5 op Esittely

Alternatives to the DFT

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

OPINTOJAKSOJA KOSKEVAT MUUTOKSET/MATEMATIIKAN JA FYSIIKAN LAITOS/ LUKUVUOSI

Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1.

MS-C2128 Ennustaminen ja Aikasarja-analyysi, 5 op Esittely

19. Statistical Approaches to. Data Variations Tuomas Koivunen S ysteemianalyysin. Laboratorio. Optimointiopin seminaari - Syksy 2007

1. PÄÄTTELY YHDEN SELITTÄJÄN LINEAARISESTA REGRESSIOMALLISTA

I. Principles of Pointer Year Analysis

Bachelor level exams by date in Otaniemi

Bachelor level exams by subject in Otaniemi

7.4 Variability management

Kvanttilaskenta - 2. tehtävät

HARJOITUS- PAKETTI A

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward.

TILASTOTIEDE. Perusopinnot

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

SIMULINK S-funktiot. SIMULINK S-funktiot

The Viking Battle - Part Version: Finnish

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

Supplementary Table S1. Material list (a) Parameters Sal to Str

Capacity utilization

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4)

OP1. PreDP StudyPlan

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL

Esim Brand lkm keskiarvo keskihajonta A ,28 5,977 B ,06 3,866 C ,95 4,501

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana

MS-A0501 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

Results on the new polydrug use questions in the Finnish TDI data

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007

Information on Finnish Courses Autumn Semester 2017 Jenni Laine & Päivi Paukku Centre for Language and Communication Studies

1.3 Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

Liite B. Suomi englanti-sanasto

Tilastotieteen johdantokurssi (TJK) 3 tai 5 op Introduction to Statistics

Network to Get Work. Tehtäviä opiskelijoille Assignments for students.

Lauri Tarkkonen: Erottelu analyysi

Strict singularity of a Volterra-type integral operator on H p

812336A C++ -kielen perusteet,

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo

General studies: Art and theory studies and language studies

Categorical Decision Making Units and Comparison of Efficiency between Different Systems

MS-A0502 Todennäköisyyslaskennan ja tilastotieteen peruskurssi

Teknillinen tiedekunta, matematiikan jaos Numeeriset menetelmät

OPINTOJAKSOJA KOSKEVAT MUUTOKSET/MATEMATIIKAn JA FYSIIKAN LAITOS LUKUVUOSI

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

Toppila/Kivistö Vastaa kaikkin neljään tehtävään, jotka kukin arvostellaan asteikolla 0-6 pistettä.

1. USEAN SELITTÄJÄN LINEAARINEN REGRESSIOMALLI JA OSITTAISKORRELAATIO

Tilanne sekä MS-A0003/4* Matriisilaskenta 5 op

UEF Statistics Teaching Bulletin, Spring 2017

Windows Phone. Module Descriptions. Opiframe Oy puh Espoo

TM ETRS-TM35FIN-ETRS89 WTG

make and make and make ThinkMath 2017

HITSAUKSEN TUOTTAVUUSRATKAISUT

Research plan for masters thesis in forest sciences. The PELLETime 2009 Symposium Mervi Juntunen

Operatioanalyysi 2011, Harjoitus 2, viikko 38

C++11 seminaari, kevät Johannes Koskinen

Operatioanalyysi 2011, Harjoitus 4, viikko 40

Mat Seminar on Optimization. Data Envelopment Analysis. Economies of Scope S ysteemianalyysin. Laboratorio. Teknillinen korkeakoulu

7. Product-line architectures

Metsälamminkankaan tuulivoimapuiston osayleiskaava

ISEB/ISTQB FOUNDATION CERTIFICATE IN SOFTWARE TESTING III

TM ETRS-TM35FIN-ETRS89 WTG

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

Kvanttilaskenta - 1. tehtävät

MALE ADULT FIBROBLAST LINE (82-6hTERT)

,0 Yes ,0 120, ,8

Miehittämätön meriliikenne

Use of spatial data in the new production environment and in a data warehouse

( ( OX2 Perkkiö. Rakennuskanta. Varjostus. 9 x N131 x HH145

TM ETRS-TM35FIN-ETRS89 WTG

x = y x i = y i i = 1, 2; x + y = (x 1 + y 1, x 2 + y 2 ); x y = (x 1 y 1, x 2 + y 2 );

TM ETRS-TM35FIN-ETRS89 WTG

Transkriptio:

MS-A0504 First course in probability and statistics Week 6 Statistical dependence and linear regression Heikki Seppälä Department of mathematics and system analysis School of science Aalto University Spring 2016

Contents Description of data set of two variables Least squares method Linear regression

Contents Description of data set of two variables Least squares method Linear regression

Describing the data set of two variables Collected data: n observed units, p variables. Choose two variables for analysis, which means that we analyse data set (x, y) consisting of pairs (x 1, y 1 ),..., (x n, y n ).

Example. Evaluation of the course Can we predict exam points from exercise points? id exam (y) report exercises (x) grade 1 0 0 0 0 2 17 5 20 5 3 15 5 0 3 4 12 6 16 4 5 19 5 20 5 6 21 6 17 5 7 0 0 3 0 8 13 6 9 4 9 19 6 12 5 10 0 0 0 0 11 15 5 19 5 12 12 6 0 3 13 13 5 17 4 Input (explanatory): x = (0, 20, 0, 16, 20, 17, 3, 9, 12, 0, 19, 0, 17) Output (dependent): y = (0, 17, 15, 12, 19, 21, 0, 13, 19, 0, 15, 12, 13)

Scatter plot Data points: (x 1, y 1 ),..., (x n, y n )

Sample variance The sample covariance of data vectors x and y is defined by s(x, y) = 1 n 1 n (x i m(x))(y i m(y)), i=1 where m(x) and m(y) are sample means of data vectors. Remark: s(x, x) = s 2 (x) is the sample variance of x s(y, y) = s 2 (y) is the sample variance of y s(x, x) = s(x) is the sample standard deviation of x s(y, y) = s(y) is the sample standard deviation of y

Example. Evaluation of the course id exam (y) report exercises (x) grade 1 0 0 0 0 2 17 5 20 5 3 15 5 0 3 4 12 6 16 4 5 19 5 20 5 6 21 6 17 5 7 0 0 3 0 8 13 6 9 4 9 19 6 12 5 10 0 0 0 0 11 15 5 19 5 12 12 6 0 3 13 13 5 17 4 The sample covariance s(x, y) = cov(x,y) = 43.67 We need to normalise this to be able to interpret it.

Sample correlation Pearson s sample correlation of data vectors x and y is defined by r(x, y) = s(x, y) [ 1, +1] s(x)s(y) Karl Pearson FRS 1857 1936 Pearson s correlation measures linear dependence: If r(x, y) > 0, then x and y are positively correlated If r(x, y) = 0, then x and y are uncorrelated If r(x, y) < 0, then x and y are negatively correlated

Example. Evaluation of the course id exam (y) report exercises (x) grade 1 0 0 0 0 2 17 5 20 5 3 15 5 0 3 4 12 6 16 4 5 19 5 20 5 6 21 6 17 5 7 0 0 3 0 8 13 6 9 4 9 19 6 12 5 10 0 0 0 0 11 15 5 19 5 12 12 6 0 3 13 13 5 17 4 Pearson s sample correlation r(x, y) = cor(x,y) = 0.694 Exercise points and exam points appears to be positively correlated Or is this caused by random variation?

Testing for correlations Null hypothesis (stochastic model): Observed pairs (x i, y i ) are realizations of independent random vectors (X i, Y i ) N 2 (µ X, µ Y, σ 2 X, σ2 Y, ρ XY ). H 0 : ρ XY = 0 vs. H 1 : ρ XY 0 William S Gosset (a.k.a. Student ) 1876 1937 If the initial hypothesis and the null hypothesis hold, the test statistic t(x, Y ) = r(x, Y ) n 2 1 r(x, Y ) 2 is t-distributed with degrees of freedom n 2. If the absolute value of test statistic is large, then it is unlikely that the null hypothesis is true

Example. Evaluation of the course id exam (y) report exercises (x) grade 1 0 0 0 0 2 17 5 20 5 3 15 5 0 3 4 12 6 16 4 5 19 5 20 5 6 21 6 17 5 7 0 0 3 0 8 13 6 9 4 9 19 6 12 5 10 0 0 0 0 11 15 5 19 5 12 12 6 0 3 13 13 5 17 4 Is the joint distribution of exam and exercise points a bi-variate normal distribution? No. Both variables are discrete and usually neither of them is symmetric. In this case we can not test the correlation using the aforementioned test.however, there are non-parametric tests which can be used also in this setting (course MS-C2104 Introduction to Statistical Inference).

Example. Heights of fathers and sons Height Son 150 160 170 180 190 200 150 160 170 180 190 Father Are the heights of fathers and sons from a bi-variate normal distribution?

f Example. Heights of fathers and sons Son Father

Example. Heights of fathers and sons Histogram of Fathers Histogram of Sons Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Density 0.00 0.02 0.04 0.06 140 150 160 170 180 190 200 140 150 160 170 180 190 200 Height Height

Example. Heights of fathers and sons Are the heights of fathers and sons from a bi-variate normal distribution? Yes - or at least bi-variate normal distribution provides accurate enough approximation. We can test the correlation using the test. Sample correlation is cor(x,y) = 0.498 The test statistic calculated from the data t(x, y) = 18.85 p-value Pr( t(x, Y ) 18.85) = 2*(1-pt(18.85,1076)) = 0 Since the p-value is less than 0.01, the null hypothesis (ρ XY = 0) is rejected with 1 % significance level. Conclusion: heights of fathers and sons are linearly dependent.

Contents Description of data set of two variables Least squares method Linear regression

Example. Evaluation of the course id exam (y) report exercises (x) grade 1 0 0 0 0 2 17 5 20 5 3 15 5 0 3 4 12 6 16 4 5 19 5 20 5 6 21 6 17 5 7 0 0 3 0 8 13 6 9 4 9 19 6 12 5 10 0 0 0 0 11 15 5 19 5 12 12 6 0 3 13 13 5 17 4 Pearson s sample correlation r(x, y) = 0.694 Linear dependence between variables is somewhat strong What is the best line for illustrating linear dependence?

Scatter plot Data points: (x 1, y 1 ),..., (x n, y n )

Fitting the line Fitted values: ŷ i = β 0 + β 1 x i

Residuals Residuals: e i = y i ŷ i

Minimization of residuals How to choose the optimal slope β 1 and constant β 0?

Minimization of residuals Sum of squares of residuals of line ŷ = β 0 + β 1 x SSE(β 0, β 1 ) = n (y i ŷ i ) 2 = i=1 n (y i β 0 β 1 x i ) 2 i=1 Least squares method Find (β 0, β 1 ) such that sum of squared residuals is minimized. Solution: Differentiate SSE(β 0, β 1 ) with respect to β 0 and β 1, set both to zero and solve these equations. Answer: (β 0, β 1 ) = (b 0, b 1 ), where b 1 = r(x, y) s(y) s(x), b 0 = m(y) b 1 m(x).

Example. Evaluation of the course id exam (y) report exercises (x) grade 1 0 0 0 0 2 17 5 20 5 3 15 5 0 3 4 12 6 16 4 5 19 5 20 5 6 21 6 17 5 7 0 0 3 0 8 13 6 9 4 9 19 6 12 5 10 0 0 0 0 11 15 5 19 5 12 12 6 0 3 13 13 5 17 4 Sample means: m(x) = 10.2, m(y) = 12.0 Sample standard deviations: s(x) = 8.51, s(y) = 7.39 Pearson s sample correlation r(x, y) = 0.694 b 1 = r(x, y) s(y) s(x) = 0.60 b 0 = m(y) b 1 m(x) = 5.82

Example: Heights of fathers and sons Height Son 150 160 170 180 190 200 150 160 170 180 190 Father Sample means: m(x) = 171.92, m(y) = 174.46 Sample standard deviations: s(x) = 6.98, s(y) = 7.14 Pearson s sample correlation r(x, y) = 0.498 b 1 = r(x, y) s(y) s(x) = 0.514 b 0 = m(y) b 1 m(x) = 86.83

Example: Heights of fathers and sons Height Son 150 160 170 180 190 200 150 160 170 180 190 Father

Contents Description of data set of two variables Least squares method Linear regression

Prediction interval of fitted line If we fit a line to a data set of two variables using least squares method, how accurately this line predicts the values of the response variable? How likely it is that the fitted value is close to observed value? We need the stochastic model of statistical experiment.

Linear regression model Suppose that the response variable Y depends on input x as follows: Y = β 0 + β 1 x + ɛ, where ɛ N(0, σ 2 ). We take n independent measurements with input values x 1,..., x n and obtain the values Y k = β 0 + β 1 x k + ɛ k, k = 1,..., n, where the random residuals ɛ 1,..., ɛ n of the stochastic model are independent and N(0, σ 2 )-distributed. There are 3 unknown parameters: (β 0, β 1, σ 2 ).

Estimation of parameters of linear regression model The best estimators of parameters β 0, β 1 in the sense of expected squared residuals are the least squares estimators b 1 = r(x, y) s(y) s(x), b 0 = m(y) b 1 m(x). Unbiased estimator of the unknown variance parameter σ 2 S 2 = 1 n 2 n (y j ŷ j ) 2 = 1 n 2 k=1 n (y j b 0 b 1 x j ) 2. k=1

Prediction interval of fitted value of response We want to predict the value Y ( x) of response variable corresponding to input variable x based on observed data set (x 1,..., x n ; y 1,..., y n ). Predicted value is Ŷ ( x) = b 0 + b 1 x, where b 0, b 1 are estimated from the data using least squares method. The end points of (1 α) prediction interval for the response are b 0 + b 1 x ± t α/2 S 1 + 1 ( x m(x))2 + n (n 1)s 2 (x), where t α/2 is a number, for which t(n 2)-distributed random number T satisfies Pr( t α/2 T t α/2 ). Remark: The prediction interval is wider if x is far from the sample mean m(x) of observed data.

Example. Evaluation of the course Can we predict the exam points from exercise points? id exam (y) report exercises (x) grade 1 0 0 0 0 2 17 5 20 5 3 15 5 0 3 4 12 6 16 4 5 19 5 20 5 6 21 6 17 5 7 0 0 3 0 8 13 6 9 4 9 19 6 12 5 10 0 0 0 0 11 15 5 19 5 12 12 6 0 3 13 13 5 17 4 Probably yes - but we can t test it using the method above, because residuals are not normally distributed.

Example: Heights of fathers and sons Height Son 150 160 170 180 190 200 150 160 165 170 180 190 Father

Residuals of the regression model when father is approximately 165cm Histogram of residuals vs. normal distribution Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 20 10 0 10 20

Example: Heights of fathers and sons Height Son 150 160 170 180 190 200 150 160 170 180 190 Father

Residuals of the regression model when father is approximately 170cm Histogram of residuals vs. normal distribution Density 0.00 0.02 0.04 0.06 0.08 0.10 20 10 0 10 20

Example: Heights of fathers and sons Can we predict the heights of sons from the heights of fathers? It seems that the residuals are normally distributed with equal variances so we can use the regression model. (The normality assumption for residuals is not necessary, course MS-2128 Prediction and time series analysis.)

Height of son, when father is approximately 165cm Heights of sons Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 150 160 170 180 190 Height Distribution of the heights of sons and the 90% prediction interval, when the height of the father is 165cm.

Height of son, when father is approximately 170cm Heights of sons Density 0.00 0.02 0.04 0.06 0.08 150 160 170 175 180 190 Height Distribution of the heights of sons and the 90% prediction interval, when the height of the father is 170cm.

Example: Heights of fathers and sons (90 % prediction interval) Height Sons 150 160 170 180 190 200 150 160 170 180 190

What next?

Stochastics and Statistics Courses 2015 2016 MS-C2111 S TOKASTISET PROSESSIT MS-E1600 P ROBABILITY THEORY Periodi I, 5 op, tekn. kand. Luennoitsija: Lasse Leskelä Esitiedot: MS-A050X Todennäköisyyslaskennan ja tilastotieteen peruskurssi MS-A000X Matriisilaskenta MS-A020X Differentiaali- ja integraalilaskenta 2 Stokastisilla prosesseilla mallinnetaan tekniikan, talouden ja luonnontieteiden sovelluksissa esiintyviä ajasta riippuvia satunnaisilmiöitä. Tällä kurssilla opimme analysoimaan stokastisia populaatiomalleja Markov-prosessien avulla sekä ennakoimattomien tapahtumien esiintymistä Poisson-prosessien avulla. Lisäksi opimme analysoimaan yksinkertaisten uhkapelien sijoitusstrategioita martingaalien avulla. Tämän kurssin tiedot ovat tärkeitä useimmilla stokastiikan ja tilastotieteen jatkokursseilla. Period III, 5 cr, MSc Lecturer: Prerequisites: MS-C2103 KOESUUNNITTELU JA TILASTOLLISET MALLIT MS-C2128 E NNUSTAMINEN JA AIKASARJA - ANALYYSI la. Kurssin tavoitteena on oppia, kuinka aikasarjoja analysoidaan ja miten niiden avulla laaditaan ennusteita. Kurssi kattaa yleisimmät mallit, kuten ARIMA-mallit ja dynaamiset regressiomallit, mutta myös muita tulosten kannalta oleellisia asioita, kuten diagnostiikan ja mallin valinnan. Kurssilla käytetään R-ohjelmistoa. -Niels Bohr Jos tietyt matemaattiset oletukset täyttyvät, voidaan tehdä käyttökelpoisia ennusteita historiallisten aikasarja-aineistojen perusteel- 30 0 10 "Ennustaminen on vaikeaa, varsinkin tulevaisuuden" 2007 2008 2009 2010 2011 2012 2013 Date MS-E1601 B ROWNIAN MOTION AND STOCHASTIC ANALYSIS Period II, 5 cr, MSc Lecturer: Prerequisites: Lauri Viitasaari MS-E1600 Probability theory (MS-C2111 Stokastiset prosessit) This course introduces the foundations of stochastic analysis and stochastic integration with respect to a Brownian motion. The course starts with a construction of Brownian motion and analysis of its basic properties, and continues with the construction of Ito stochastic integral. We derive the Ito formula which is the equivalent of the fundamental theorem of calculus for stochastic integrals, and discuss its applications to mathematical finance. MS-E1996 M ULTIVARIATE LOCATION AND SCATTER Where is the data? How is it scattered? 15 10 When dealing with multivariate observations, the very first questions that come to mind are: 20 Pauliina Ilmonen At least one matrix algebra and one MSc level statistics/probability course 5 Period II, 5 cr, MSc Lecturer: Prerequisites: 10 15 20 Periodit III IV, 5 op, tekn. kand./di Luennoitsija: Heikki Seppälä Esitiedot: MS-A050X Todennäköisyyslaskennan ja tilastotieteen peruskurssi Kurssilla esitellään tavallisimpia koejärjestelyitä sekä menetelmiä tilastollisen analyysin tekemiseen. Tavoitteena on oppia valitsemaan sopiva koejärjestely tilastollisen testin toteuttami- seksi, suorittamaan testi ja analysoimaan tulokset. Kurssi kattaa regressioanalyysin perusteet, varianssianalyysin sekä valikoituja koejärjestelyitä, kuten lohkoasetelmat, faktorikokeet sekä vastepintamenetelmän. Kurssilla käytetään R-ohjelmistoa. 20 Tenor basis spread (bp) 40 Periodi II, 5 op, tekn. kand. Luennoitsija: Heikki Seppälä Esitiedot: MS-A050X Todennäköisyyslaskennan ja tilastotieteen peruskurssi MS-A020X Differentiaali- ja integraalilaskenta 2 (MS-C2111 Stokastiset prosessit) Kalle Kytölä MS-C1540 Euklidiset avaruudet This course is about the mathematical foundations of randomness. Most advanced topics in stochastics and statistics rely on probability theory. The basic constructions are identical to measure theory, but there are a number of distinctly probabilistic features such as independence, notions of convergence of random variables, information contained in a sigma-algebra, conditional expectation, characteristic functions and generating functions, laws of large numbers and central limit theorems, etc. These questions are discussed together with selected applications. This is an advanced course in statistics for MSc and doctoral students. Only 10 students are admitted to this course, so email the lecturer ASAP to register. Topics include: M-estimates of location and scatter, MCD-estimates, spatial sign and rank based estimates, multivariate location tests, autocovariance matrices and applications, PCA using different location and scatter estimates, multivariate regression analysis based on spatial signs and ranks, scatter matrix based ICA, complex time series ICA, ICS and skewness and kurtosis. MS-C2104 T ILASTOLLISEN ANALYYSIN PERUSTEET Periodit III IV, 5 op, tekn. kand./di Luennoitsija: Pauliina Ilmonen Esitiedot: MS-A050X Todennäköisyyslaskennan ja tilastotieteen peruskurssi MS-A000X Matriisilaskenta Kurssi on johdatus tietokoneavusteiseen tilastolliseen analyysiin ja tilastolliseen päättelyyn. Kurssin aiheita ovat estimointi ja väliestimointi, yksinkertaiset parametriset ja epäparametriset testit, tilastollinen riippuvuus ja korrelaatio, lineaarinen regressioanalyysi ja varianssianalyysi. Kurssilla käytetään R-ohjelmistoa. MS-E2112 M ULTIVARIATE STATISTICAL ANALYSIS Periods III IV, 5 cr, MSc Lecturer: Pauliina Ilmonen Prerequisites: At least one statistics/probability and one matrix algebra course This course is an introduction to multivariate statistical analysis. The goal is to learn basics of common multivariate data analy- sis techniques and to use the methods in practice. Software R is used in the exercises of this course. The topics of the course are multivariate location and scatter, principal component analysis, bivariate correspondence analysis, multivariate correspondence analysis, canonical correlation analysis, discriminant analysis, classification, and clustering. MS-E1602 L ARGE RANDOM SYSTEMS Period IV, 5 cr, MSc Lecturers: Lasse Leskelä and Kalle Kytölä Prerequisites MS-E1600 Probability theory, (MS-C2111 Stokastiset prosessit) Many interesting random systems contain a large number of simpler constituents interacting with each other. This course covers both mathematical techniques for the study of such systems, and important probabilistic models of a range of different phenomena. The theory focuses on tightness and weak convergence of probability measures. Examples include random walk and Brownian motion, percolation, Curie-Weiss model and Ising model, and voter model and contact process.

Stochastics & statistics @ Aalto Bachelor courses First course in probability and statistics (E) Stochastic processes Introduction to Statistical Inference (E) Design of experiments and statistical models Prediction and time series analysis Masters courses Probability theory (E) Large random systems (E) Multivariate statistical analysis (E) Brownian motion and stochastic analysis (E) Multivariate location and scatter (E)

The course ends here. Thanks for attending and good luck for exams!

References The slides are partly based on the previous lecture slides (Ilkka Mellin, Milla Kibble, Juuso Liesiö, Lasse Leskelä, Kalle Kytölä).