SGN-2506 Introduction to Pattern Recognition Fall 2006 Exam 1-Dec-2006

SGN-2506 Introduction to Pattern Recognition Fall 2006 Exam -Dec-2006 Perform five (no more!) freely chosen problems of Problems -6. Each of them is worth of 6 points. No literature. Needed formulas are given in context. Use of functional calculator allowed. In order to use a programable calculator, you must let the examination supervisor clear its memory at the beginning of the exam! Problems:. Decide, whether the following statements are true or false. If you do not know the answer, answer. Simply answer only either True, False, or. No additional explanations are needed! Each correct answer gives + point, and each wrong answer gives - point. No answer ( ) is worth of 0 points. The absolute minimum score of the problem is 0 points even if you answer wrongly to each statement a) f). a) The density p(x θ) is identifiable, if there exist such different values θ θ of the parameter θ that for any given feature value x it holds p(x θ) p(x θ ). b) The components of the multinormal random vector are independent if the covariance matrix of its distribution is a diagonal matrix. c) Any continuous function defined on the real line R defines a probability density function if and only if the integral of the function over R is equal to. d) The decision regions of non-linear discriminant functions need not to be convex. e) The expectation-maximization algorithm can be applied (reasonably) also in such situations, where the global maximum of the likelihood does not exist. f) The gradient of the perceptron criterion functionj p, J p (a) = y Y(a) y, is constant with respect to the augmented weight vector a of the corresponding linear discriminant function. 2. Consider a three category k-nearest neighbor classifier and the following set of two-dimensional labeled prototypes: ω (4,4) (4,7) (8,6) (7,4) (4,9) (6,2) (2,6) ω 2 (-7,8) (3,) (-2,-) (4,3) (3,4) (4,-3) (0,6) ω 3 (3,2) (8,) (3,3) (6,3) (5,0) (3,9) (,7) Classify the point (4,4) based on (a) the nearest neighbor rule

(b) The 3-nearest neighbor rule (c) The 7-nearest neighbor rule Measure distances using the L -metric: d (a,b) = max a i b i. i Correct answers without explanation how you have obtained them will be worth of zero points; so, explain how you got the answers! 3. a) Consider a linear machine with discriminant functions g i (x) = w t x + w i0, i =,...,c. Show that the decision regions are convex by showing that if x R i and x 2 R i then λx + ( λ)x 2 R i for 0 λ. b) Consider the set of feature vectors { [ ] [ [ [ 0 0 } D =,,, 0 0] ] ] and the classes ω = { [ ] 0, 0 [ ] }, ω 2 = { [ ] 0, [ ] } 0 Show that ω and ω 2 are not linearly separable, that is, show that there are no line separating ω and ω 2. 4. Consider a univariate two-class classification problem. You have collected a feature data D = D D 2, where the features D = {.6,.8, 2., 2.2, 2.7, 2.8, 3.0, 3.2, 3.5} have been measured from the objects known to belong to the class ω and the features D 2 = {2.3, 3., 3.3, 3.4, 3.6, 3.8, 4.4} have been measured from the objects known to belong to the class ω 2. A new object is chosen and the feature value 3.5 is measured. Classify this object. Instruction: Let the window function be { u /2 ϕ(u) = 0 otherwise Estimate the class conditional densities with the Parzen window estimator of the form p n (x) = n ϕ( x x i ) n V n h n i= with h n =.2. Estimate also the prior probabilities P(ω ) and P(ω 2 ) based on the training data D. Thereafter, apply the Bayes minimum error rate classification rule to p n and to the estimated prior probabilities. (Remember the Bayes rule: P(ω x) = P(ω)p(x ω) ) p(x) 2

5. Let x = (4, 5) T,x 2 = (, 4) T,x 3 = (0, ) T,x 4 = (5, 0) T, and consider the following three partitions. (a) D = {x,x 2 }, D 2 = {x 3,x 4 } (b) D = {x,x 4 }, D 2 = {x 2,x 3 } (c) D = {x,x 2,x 3 }, D 2 = {x 4 } Which of these does the sum-of-squared errors criterion (the K-means criterion) favor? (Remember the sum-of-squared errors criterion: J({µ,...,µ c }) = c i= x D i x µ i 2 ) 6. In many pattern classification problems one has the option either to assign the pattern to one of the c classes or to reject it as being unrecognizable. If the cost for rejects is not too high, rejection may be a desirable action. Let 0 i = j, i, j =,...,c λ(α i ω j ) = λ r i = c +, λ s otherwise where λ r is the loss incurred for choosing the c + th action, rejection, and λ s is the loss incurred for making any substitution error. Show that minimum risk is obtained if we decide ω i if P(ω i x) P(ω j x) for all j i and if P(ω i x) λ r /λ s and reject otherwise. What happens if λ r = 0? What happens if λ r > λ s? (Remember the formula R(α i x) = c λ(α i ω j )P(ω j x)) j= 3

SGN-2506 Introduction to Pattern Recognition Fall 2006 Exam 26-Feb-2007 Perform five (no more!) freely chosen problems of Problems -6. Each of them is worth of 6 points. No literature. Needed formulas are given in context. Use of functional calculator allowed. In order to use a programable calculator, you must let the examination supervisor clear its memory at the beginning of the exam! Problems:. Decide, whether the following statements are true or false. If you do not know the answer, answer. Simply answer only either True, False, or. No additional explanations are needed! Each correct answer gives + point, and each wrong answer gives - point. No answer ( ) is worth of 0 points. The absolute minimum score of the problem is 0 points even if you answer wrongly to each statement a) f). a) The Parzen density estimator is a non-parametric method to estimate also continuous densities based on finite data records. b) The density p(x θ) is identifiable, if there exist such different values θ θ of the parameter θ that for any given feature value x it holds p(x θ) p(x θ ). c) The K-means algorithm belongs to the unsupervised classification methods. d) The lower the training error, the lower the classification error also for newcoming features. e) Introducing new features to the classification system, that is extending the feature space, the Bayes (classification) error can be decreased. f) Defining L : R 2 {0, }, L(x, y) = { 0, x = y,, x y, one can prove that L is a metric. 2. The random variables X and Y possess the joint uniform density over the region depicted (in black) in Figure below. Find a) the expression for the joint density f (X,Y ) (x, y); Hint: whenever both x and y belong to the black region, the joint density function f (X,Y ) (x, y) takes a positive constant value and vanishes otherwise. b) the expression for the marginal density f X (x); c) the expression for the conditional density f Y X (y x); d) Are X and Y independent? Explain briefly, how you deduced the answers. Hint: You do not have to integrate anything.

Figure : The random variables X and Y in Problem 2 have the joint uniform distribution over the black region. That is, whenever both x and y belong to the black region, the joint density function f (X,Y ) (x, y) takes a positive constant value and vanishes otherwise. 3. Suppose a two-class classification problem with the priors P(ω ) = 0.3 and P(ω 2 ) = 0.7. Suppose further the uniform class-conditional distributions on the closed hyper-interval [a i,b i ], p(x ω i ) = { vol([a i,b i]), if a i x b i, i =, 2, 0, otherwise, where vol([a i,b i ]) is the volume of [a i,b i ] and is operated componentwise. Compute the Bayes error. Explain and write down the needed computational steps. (The Bayes error, P(error): P(error) = P(error x)p(x)dx, where P(error x) is the probability of misclassification given the feature value x and the integration is performed over the whole feature space (i.e. all possible feature values x.) The Bayes rule, in turn, is given by and the volume of a hyper-interval by P(ω x) = p(x ω)p(ω), p(x) vol([c,d]) = l (d j c j ), j= 2

where l is the dimension of the vectors c and d.) 4. Consider a three category k-nearest neighbor classifier and the following set of two-dimensional labeled prototypes: ω (-2,-2) (2,) (0,-) (0,0) (3,0) (3,) (5,0) ω 2 (-,0) (0,-) (-,-) (4,2) (,4) (4,-) (,) ω 3 (-2,-2) (,-6) (3,-3) (0,-2) (2,0) (0,9) (-,3) Classify the point (,) based on (a) the nearest neighbor rule, (b) the 3-nearest neighbor rule, (c) the 5-nearest neighbor rule, and (d) The 7-nearest neighbor rule Measure distances using the Mahalanobis metric: L Mahalanobis,C (a,b) = (a b)c (a b) T, where [ 4 ] C = 3 2 3 2 3 and a and b are (here) 2-dimensional row vectors. Correct answers without explanation how you have obtained them will be worth of zero points; so, explain how you got the answers! 5. Let a one-dimensional classification task consist of c different classes, ω, ω 2,...,ω c, each obeying either exponential or normal class conditional density. Prove that the Bayes minimum error rate classifier can be identified with at most quadratic discriminant functions g i (x), i =, 2,..., c, that is, each g i (x) can be expressed in the form g i (x) = w i2 x 2 + w i x + w i0, where w i2, w i and w i0 are scalars. (The exponential density function: { θ exp( θx) if x 0 p(x θ) = 0 otherwise and normal density function: p(x [µ, σ]) = 4 3 2πσ 2 exp ( 2 (x µ)2 /σ 2).) 6. a) Consider the following six training samples from the classes ω and ω 2 : ω : (, 2) T (2, 4) T ( 3, ) T ω 2 : (2, 4) T (, 5) T (5, 0) T 3

Are they linearly separable? Explain. b) Compute the value of the perceptron criterion function for the above six training samples at a = (0, 0, ) T. Starting from a = (0, 0, ) T take a single step of the perceptron algorithm with the learning rate η(k) =. Remember to augment and normalize the feature vectors. (Perceptron criterion function: where Y(a) = {y a T y < 0}.) J p (a) = y Y(a) a T y, 4

SGN-2500 Johdatus hahmontunnistukseen Kevät 2007 SGN-2506 Introduction to Pattern Recognition Fall 2006 Tentti 24.9.2007 / Exam 24-Sep-2007 Tentti on kaksikielinen: kysymykset esitetään sekä suomeksi että englanniksi. Voit vastata kummalla kielellä haluat. Exam is bilingual: Problems are stated both in Finnish and in English. You may answer with either language. Vastaa viiteen (ei kuuteen!) vapaavalintaiseen tehtävään Tehtävistä -6. Jokaisen maksimipistemäärä on 6 pistettä. Ei kirjallisuutta. Tarvittavat kaavat annetaan tehtävien yhteydessä. Funktiolaskin sallittu. Jos haluat käyttää ohjelmoitavaa laskinta, tentinvalvojan on annettava tyhjentää laskimen muisti tentin alussa! Perform five (no more!) freely chosen problems of Problems -6. Each of them is worth of 6 points. No literature. Needed formulas are given in context. Use of functional calculator allowed. In order to use a programable calculator, you must let the examination supervisor clear its memory at the beginning of the exam! Tehtävät: Problems:. Ovatko seuraavat väitteet tosia vai epätosia? Ellet tiedä vastausta, vedä viiva. Vastaa pelkästään joko Tosi, Epätosi tai. Mitään lisäselityksiä ei tarvita! Jokaisesta oikeasta vastauksesta saat + pisteen ja jokaisesta väärästä vastauksesta - pisteen. Vastaamatta jättämisestä ( ) saat 0 pistettä. Tehtävän minimipistemäärä on 0 pistettä, vaikka vastaisit väärin kaikkiin väitteisiin a) f). Decide, whether the following statements are true or false. If you do not know the answer, answer. Simply answer only either True, False, or. No additional explanations are needed! Each correct answer gives + point, and each wrong answer gives - point. No answer ( ) is worth of 0 points. The absolute minimum score of the problem is 0 points even if you answer wrongly to each statement a) f). a) Bayes-(minimivirhe)luokittajaa vastaava erotinfunktio on lineaarinen (piirrevektorin suhteen). Tosi/Epätosi/ The corresponding discriminant function of the Bayes (minimum error rate) classifier is linear (with respect to the feature vector). b) Tiheysfunktio p(x θ) on tunnistettavissa, jos on olemassa sellaiset eri parametriarvot θ θ, että joka piirteelle x on p(x θ) p(x θ ). Tosi/Epätosi/ The density p(x θ) is identifiable, if there exist such different values θ θ of the parameter θ that for any given feature value x it holds p(x θ) p(x θ ). c) Suurimman uskottavuuden menetelmä on ei-parametrinen keino havaitun datan tiheysfunktion estimoimiseksi. Tosi/Epätosi/ The Maximum likelihood method is a non-parametric way to estimate density functions based on observed data.

d) Perceptron -algoritmi pyrkii minimoimaan opetusvirheen. Tosi/Epätosi/ The Perceptron algorithm tries to minimize the training error. e) Yleinen dataan D perustuva tiheysestimaattori on p n (x) = k/(nv ), missä V on x:n ympärille muodostetun alueen B tilavuus, n on datan D pisteiden kokonaislukumäärä ja k on alueeseen B kuuluvien datapisteiden lukumäärä.. Tosi/Epätosi/ The general density estimator p n (x) based on the data D is defined as p n (x) = k/(nv ), where V is the volume of the region B around x, n is the total number of data points in D, and k is the number of the data points enclosed by B. f) Normaalijakautuneiden luokkien erotinfunktiot ovat enintään neliöllisiä. Tosi/Epätosi/ We can find for normally distributed classes the discriminant functions beeing at most quadratic. 2. Jatkuva satunnaismuuttuja X mittaa erään systeemin komponentin lämpötilaa (Celsius-asteissa). X noudattaa alla annettua jatkuvaa tiheysfunktiota f: The continuous random variable X measures the temperature (in o C degrees) of a component of a system and X possesses the continuous density { c exp(a x ) jos/if x a f(x) = 0 jos/if x < a Olkoon a kiinnitetty vakio. Määrää vakio c siten, että f on tiheysfunktio. Laske vielä todennäköisyys, että komponentin lämpötila on alle 0 o C astetta. Consider a as a fixed constant. Determine the constant c such that f is a density function. Thereafter, compute the probability that the temperature of the component is below zero degrees in Celsius. 3. Ajatellaan kolmen luokan k:n lähimmän naapurin luokittajaa ja allaolevaa 2- dimensioista opetusdataa: Consider a three category k-nearest neighbor classifier and the following set of two-dimensional labeled prototypes: ω (4,4) (4,7) (8,6) (7,4) (4,9) (6,2) (2,6) ω 2 (-7,8) (3,) (-2,-) (4,3) (3,4) (4,-3) (0,6) ω 3 (3,2) (8,) (3,3) (6,3) (5,0) (3,9) (,7) Luokita piste (4,4) käyttäen / Classify the point (4,4) based on (a) lähimmän naapurin sääntöä / the nearest neighbor rule, (b) kolmen lähimmän naapurin sääntöä / the 3-nearest neighbor rule, 2

(c) seitsemän lähimmän naapurin sääntöä / the 7-nearest neighbor rule. Etäisyydet käyttäen L -metriikkaa / Measure distances using the L -metric: d (a,b) = max a i b i. i 4. Olkoon n:n näytteen joukko D jaettu erillisiin osajoukkoihin D,...,D c, missä c < n. Jos jokin tämän jaon jäsen, sanotaan D i on tyhjä joukko, ei otoskeskiarvo µ i ole määritelty. Todista, ettei kriteerifunktion J e = x µ i 2 D i x D i minimoivaan jakoon kuulu tyhjää joukkoa. If a set of n samples D having different values is partitioned into c < n disjoint subsets D,..., D c the sample mean µ i is undefined if the set D i is empty. Show that the partition minimizing the sum-of-squared errors J e = x µ i 2 D i x D i includes no empty sets. Vihje: Olkoon annettuna jako, jossa on mukana tyhjä joukko. Silloin on olemassa myös sellainen jako, jossa ei ole mukana tyhjää joukkoa ja tuon jaon kriteerifunktion J e arvo on pienempi kuin alkuperäisen jaon. Etsi uusi jako sijoittamalla yksi näyte (millainen?) annetun jaon tyhjään joukkoon niin, että kriteerifunktion arvo pienenee. Hint: Given a partition with an empty set, you can always find a partition with smaller criterion value J e by re-placing one sample (what like?) to the empty set. Normi on euklidinen / The norm is Euclidean: y = y T y. Voit soveltaa (todistamatta) tulosta / You may apply (without proving) the result k+ m= x m µ (k+) 2 = k x m µ (k) 2 + k(k + ) µ (k+) µ (k) 2, m= missä / where µ (k) = k k m= x m on keskiarvo laskettuna yli k:n näytteen x m / is the mean computed over k samples x m. 5. Seuraavat neljä opetusnäytettä kahdesta luokasta ω ja ω 2 eivät ole lineaarisesti separoituvia. Havainnollista perceptron-algoritmin tyypillistä toimintaa tällaisessa luokitustilanteessa. The following four training samples from the classes ω and ω 2 are not linearly separable. Demonstrate the typical behavior of the perceptron algorithm in this kind of classification task. 3

ω : (0, 0) T ω 2 : (0, ) T (, ) T (, 0) T Ohje: Laske perceptron-algoritmin kriteerifunktion arvo yllä oleville opetusnäytteille jossakin sopivasti valitussa pisteessä a. Sovella perceptron-algoritmia tästä pisteestä a lähtien niin kauan, kunnes löydät algoritmin jakson eli huomaat, ettei algoritmi suppene. Käytä oppimisnopeutena η =. Muista piirrevektorin jatkaminen ja -:llä kertominen. Instruction: Compute the value of the perceptron criterion function for the above four training samples at some appropriately chosen a. Starting from this point a take steps of the perceptron algorithm with the learning rate η = until you find a cycle, that is, the algorithm does not converge. Remember to augment and normalize the feature vectors. (Perceptron-algoritmin kriteerifunktio / Perceptron criterion function: J p (a) = a T y, missä / where Y(a) = {y a T y < 0}.) y Y(a) 6. Erään kaksidimensioisen luokittimen päätöspinnat muodostavat joukon / A twodimensional classifier has decision boundaries which form the following set {(x, y) R 2 x = y x = y + x = y + 2}. (a) Enintään kuinka monen luokan luokitustehtävä on kyseessä? Perustelu. (2p.) (b) Olkoon tehtävässä (a)-kohdan mukainen enimmäismäärä luokkia. Onko luokitin tällöin välttämättä lineaarinen? Todistus. (2p.) (c) Olkoon tehtävässä yksi luokka vähemmän kuin (a)-kohdan enimmäismäärä. Onko luokitin tällöin välttämättä lineaarinen? Todistus. (2p.) (a) At most how many classes does the classification task consist of? Explain. (2p.) (b) Suppose the maximum number of classes, say n. Hence, is the classifier necessarily linear? Proof. (2p.) (c) Now, let the task consist of n classes, that is, one less than the maximum number of classes. Is the classifier here necessarily linear? Proof. (2p.) 4