Large Scale Sequence Analysis with Applications to Genomics

Koko: px
Aloita esitys sivulta:

Download "Large Scale Sequence Analysis with Applications to Genomics"

Transkriptio

1 Large Scale Sequence Analysis with Applications to Genomics Gunnar Rätsch, Max Planck Society Tübingen, Germany Talk at CWI, Amsterdam October 23, 2009 Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

2 Discovery of the Nuclein (Friedrich Miescher, 1869) T ubingen, around 1869 Discovery of Nuclein: from lymphocyte & salmon multi-basic acid ( 4) If one... wants to assume that a single substance... is the specific cause of fertilization, then one should undoubtedly first and foremost consider nuclein (Miescher, 1874) Gunnar R atsch (FML, T ubingen) Large Scale Sequence Analysis October, 23,

3 Research Topics Machine Learning 1 Statistical inference methods for structured data Develop fast and accurate learning methods 2 Convergence properties of iterative algorithms Boosting-like algorithms and semi-infinite LPs 3 Genome annotation Predict features encoded on DNA Molecular Biology 4 Biological networks Understand interactions between gene products 5 Analysis of polymorphisms Discover polymorphisms and associate with phenotypes Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

4 Research Topics Machine Learning 1 Statistical inference methods for structured data Develop fast and accurate learning methods 2 Convergence properties of iterative algorithms Boosting-like algorithms and semi-infinite LPs 3 Genome annotation Predict features encoded on DNA Molecular Biology 4 Biological networks Understand interactions between gene products 5 Analysis of polymorphisms Discover polymorphisms and associate with phenotypes Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

5 Machine Learning Given: Observations of some complex phenomenon Goal: Learn from data & build predictive models Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

6 Machine Learning Given: Observations of some complex phenomenon Goal: Learn from data & build predictive models Example: Two different classes of observations Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

7 Machine Learning Given: Observations of some complex phenomenon Goal: Learn from data & build predictive models Example: Inferred classification rule Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

8 Machine Learning Given: Observations of some complex phenomenon Goal: Learn from data & build predictive models Recent contributions: 1 Large scale sequence classification 2 Analysis and explanation of learning results 3 Sequence segmentation & structure prediction k mer Length Position Log-intensity transcript Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

9 Learning about the Transcriptome Machine Learning View How to learn to predict what these processes accomplish? How well can we predict it from the available information? Biological View 1 What can we not predict yet? What is missing? 2 Can we derive a deeper understanding of these processes? Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

10 Learning about the Transcriptome Machine Learning View How to learn to predict what these processes accomplish? How well can we predict it from the available information? Biological View 1 What can we not predict yet? What is missing? 2 Can we derive a deeper understanding of these processes? Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

11 Learning about the Transcriptome Machine Learning View How to learn to predict what these processes accomplish? How well can we predict it from the available information? Biological View 1 What can we not predict yet? What is missing? 2 Can we derive a deeper understanding of these processes? Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

12 Learning about the Transcriptome Machine Learning View How to learn to predict what these processes accomplish? How well can we predict it from the available information? Biological View 1 What can we not predict yet? What is missing? 2 Can we derive a deeper understanding of these processes? Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

13 Computational Genome Annotation Simplest formulation: Given a DNA sequence x { A, C, G, T } L Find the correct label sequence y = y 1 y 2... y L (y i Y = { intergenic, 5 UTR, coding, intron,... }) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

14 Example: C. elegans (I: 43,500-52,050) GAAGAAATGGAGCATTTGCGCTCCATCACACTCTCAGACAATTTCATTTTCCACATCCTATATATATTTTGGTTTTTCTGTCGTATTTTGTTTTAATTTATTGGTATTTCGTTCAAAAATAATTATTTTGACTGTATTTTTGGTTGCATA CATGTAGAACTGCTGTTTTTTAAGATATTCTGCCCATTCAAGTTTTTCAGTGTAAAATTGATATATTTCATTCCAACTGAAAATGAGATCGAAACGATGGAAAACCTCGGATATTACTGATTATGGAAAGAAGAGAAAAGAATCGGAAAG TTGTGGATCAAGTTCACCGATTCTCGAAACACAGTCATCTGGCGGTGCGGAACTTGACGAAGTTACTGAGGATGAATATTCTAGTAATTCGAGCAGTAATGAAACTAGCGACGAAGAGGAAAACTCAGAAGTACCAAATGTCTTATCTAT AACAGAAAGAGGTAAGAATTGCGTCTTCTAGTGATCATACTTTTCGCCAGATTCCCTAATGTAATATATTTTGTTGTAGAGAAAAGTTGGCAAAAGTTAACGGAAAACGATTTGGGACGAATTCGTTTCATCTTGAAGTACACTAGCAAT ACTAAAAAATGCGTGAACGAGTATTTTCAATATAATCATGGGCAAAACAATGAAATTATGAAAAGTCTATTATTGGATACCGATGGAACTATGACTGCAAAGGCTTGTTCGGAATGTGCCTACGATTTGAATCAGTAAGTTACTCTCTCG ATTTATTCCCAAAATTAATATGTGCTTCAGGTGCCACTGCAAAAAACCGCTTCGCTTCATCAATGCTCCGTGTGGTTGGTTTGCTATTCAAAACTATAAATAGTTCACTGTTTCCGTTCAGAGGTCATCAACCAAGTTCTTCATGTTGAA AATGCGGAGCCCACCAGGATCAACCATGTAATCGCAACACTCTTCCGGAATCACATTGGCGAGATTTTGTTGGTCCACTCTATTTCTGTGCGAGAACTGTGATAAAACTAGTATTTTCAGCACAAAGGCTCGAACTGCGGAAGCTCGCGC ATCTGAAGAAGCTCAAATCAGGATTCAAATCCAAGACAACTCGAACGCATTCCAAAGATCGTATCATAACGATCCACAACCTTCATCAGCCGAAGAACATGAGGAAGATATCGTGGTGGATGGCTGAGTACGGAGCTCAAATGCCTTAAG GCGAAACAATTGGTTTTTTAATTTGCTGGTTATCATGTTAGATTTTGAACGTGTTAGGTCTTTCAATTGTTTTTTTTTTTCGAAATGTTGTTGTTCTAATAAATTTGTTTTATTTAATCAAACGTTTTTTAGTCTACTACGGGCGTGAAG CCAGATATCAGTGGTATCTTCTTATCAGAAGCTGAATCATTTCCGGTTGACAATGTTTGAAGGACATAAGAAAGGCTGTGTTACTGATTTCGACCATTGATTTGTTTATATATGGATATGTTCCACTGCCTTTTGGAAAGGCAGTATTCC CGGTATATATGGGCCTAATACGGAATCTAAAATAACCTGACACAAACCTGACGTTGACCTGTTGCCGGCCCGCGGCGGCTTAGTGTCAACTTGACAGCGGGTCGCGATTTCACCTGCCAGTTGTTCTCCATTCAGCAGCCAGCGACCTGC TGGCAGGTTGCCACTAACCTGACGCGGTTTACCTGTGTTATCGGCGCGTGCATAGCTTAGTGGTTTCAGGAAATGATGCTAGTAATCAGAAGATCGGGGTTCGGGAAACGGCAGGGGCTTGAAGGTTAGGTTCTATGAAGCAGGGCGAAG GGTTGACAAGGAGAGGCAATAAGCAAGTAGTAGGGGTTCTCTAGAAAACATTTTTGTCTTTAATATGCGTTTCCTACTGATTTATTATTGATATTTGGATCCCCTTTTCTAGAAAAAAAAATCAGAATCAGCAGAAAAATTTGAGAAAAA GTCATAGCAAATCAGAGTTGGTCAGAGTAAATCAGAGCTAGTCATAGTAAATCATAGCTAGTCAGAGAATATCAGAGTTAATCAGGGTAATAAGTAGACCTAGTCATAGTAAATCAGAGCTAGGCATAGTAAAGCGTGGTTACTCCGAGT AAAACCACACTTGCACCGAACTGCGGTTAGTGTGCTTTACCATTATGTAACTCCGCTTTTTACTCTGAGTTAGTATGATATGGTTTGTCTGAGCTGTGGTTGGGCTTCGCGGGAAACTTGAATAATTCGAGACAAAATCTAATTTTAGCG AATTTTCTTTAATTTCTTTGAGGTTTCTACGACAGAACTCGAAAAATTTCGGGTTTTAATGTTTACACATTTTATTTAAAATTGAATAATCAACTGCGGGACTCCTCGAAAATCACATGCTCATTTAAATTTTGAAGTTCAAACCTCAAA AAACGCGCAAAAACCAAATTCAGCTAGGATATCAAATTTATGATTGAAATCTATATTTTGATGCGGTGTTTCTGAAGTTTTCGCGATAAAATCCGAATAATAATTCCACGTACCGTATATTCTCTATCTAATTTCCAGGTCATTTTTTAA TGCAGCACTATTAGAGACTGTCGTACTACTGGAGACTGCAGCATTAATTTTCGAACGGCTACTGTCAATTATAGATCACTAGTATTTAGTCACAAAAGCTAATTTTTTAAGCAGAAATTCATAAAAATGTTTTCAATATTGCGAACTTTT GTAACAAAAAGACCCAGTAATTCAATTACTTTCGTAAATTATCAAAAAATCATCAAAAATATACAAAAAAATACCAAAAAATATTGAAACTTTCAAGTGACTCTTTCAATAGAAAATGGGGTGCAGCACTAATAGAGACTGCTGCACTAT TTTTCGGACCCTTTTTGAATGCAGCACTATTAGAGACTGCAGTATTTACTACTGGAGATGCAGCACTAATAGAGAATATACGGTATATACGTAATATATTCTTGCAGAAAAAAGTACGATTATCAATGAAAAATAGCTGATAAGAGGCTT TTGTTTGAACTAACAGACGGAACGACTCCGGTTTAGTTCAAAAAATTCTAAAAACACGTTGTGTCAGGCTGTCTCATTGCGGTTTGATCTACGAAAAATGCGGGAATATTTTTCCAGAAAAATTGTGACGTCAGCACGCTCTTAACCATG CGAAACGAGATGAGATGTCTGCGTCTCTTTTCCCGCATTTTTCGAAGATCAAAACGAATGGGACTTTCTGACTCCACGTGTAAAAAGGGGTTACGACGGACCCTGGCCTAGAAATTAGGCGTGAAAATTCTCGGGCACTGGATGTAGTGA ACGCCCGCGATGAAAAATTGGGGGAAAATTAGGCTTTCTTTGCGAGAAAGATTAATTAAAAATGTTTTCCTTTGTCGAAAATAATTTTTAAAAAACACACCACGTGTATTCAGCTCGACCAACGCCTCGAAAATTTTCAAAAAAGGCGGG AAAAATTAGTTGAATTCGCCAAGAGGAATTTCACCGCAGCGCGTGCAAAAATTTCAGCATTTGCGCGTGACGGTGTTTGCACAAATTACACCGAATGGTCGAGCTGAAAACACGTGCACACTTTTAAATAAAACTAGAAAATAAATCCCA GGCCTGCAAATATTGCACACAAAACCGTAATCCCCTTCGCGCTAAACAACACGCGCAACGATGCTCCGCTTGGGGACAAGGAAAAATTAATTTAACTCGGGATTTTCATTAAAAAATTAGGTTTTTAGTTAATTTTTCGATGTTTTCACT GCGAAAAAGTGTTAAAATAACGATTTTTCAACCTATTTTCAATTAATCCGTGCAAAAAATCGTGTATTTCTCGAGTTTTGAAAGAAATTTATGAAAATCGGCATTTTTAATAATGGTTTTTCAAATAAAAATATAATTTTTCGGTGCAGA AAAGTCGTTGCTCGTACAGTTTTTTTAAAGCATTTTCACATCAAAATCCTCCATTTTTCCAGTAAATCGATATGGAGTGCGACGAGACAAAGCTGAGCGACGGCGCAAGCGGCTGGGTGCCGAGTATCCCGACAGATATCGATTCAAAAG ACACACCGTTGCTCGATATATCTTCTCAGGCGATTTGGGCGCTTTCCAGTTGTAAAAGCGGTAAATTTTCCGACTTTCAAGGGAGAAAAGTGTAGAAAAATCGAAATTACTTCTTAAAAATCTCGTAAAAATCGAATTCTTTCAGGATTC GGCATCGACGAGCTCCTATCCGACAGTGTTGAGAAATATTGGCAAAGCGATGGCCCGCAGCCGCACACGATTCTTCTAGAATTCCAGAAAAAGACCGACGTGGCTATGATGATGTTCTATTTGGATTTTAAAAACGACGAGTCTTATACA CCGTCAAAGTTAGCATTTTTGGCTTTTTCAAACGAAAAAATACAATGAAACACTGAATATCTAGTTTTTTTCTCAATTTTTGCCTAAAAAACGGCGATTTTTCACTAGCTTTTCAATTAAAATTTGAACAAAAAGTTTTTTAAAGGAAAA ACATGAATTTCTAGCTTTTTCAGAGGTTTTCTATTAAAAAATAGAGATTTTTGTGATATCTGACTGAAAAATTACCAAACTGTCGATTTTTTTAAACTATTTTTCACTTAAAATCTGCAATTTTTTTTTTCGAGGAAACATGTGAATTTC AAGCTTTTTCAGAGATTTTCTATGAAAAAGGTTCGTGCCGAGACCCATGTGCTTTTAAACTTCAGAATTTTCCCAATTTTGAAATTAAAAAGAGAATGAAAATTGATTTTCATGGAAAAATGCGTTTTTGGCCCAAAACCTCCAAAAAGT ACAAATATAGGTCGACTTTCAACTGTTTTAGATCAATTTTTTTGCAGAATTCAAGTAAAAATGGGTTCATCTCACCAGGATATATTTTTCCGTCAAACACAAACATTCAACGAGCCCCAGGGATGGACATTTATCGATTTACGCGACAAA AATGGGAAACCGAATCGCGTTTTTTGGCTTCAAGTACAAGTTATTCAGAATCATCAAAATGGGAGAGATACTCATATAAGGTAGAGGAATTGAGAATTTCAGAACGAAAATTGCCGAAAAAATGAAATTTTAGCGAATTTGAGTCGGAAA TTTCGAAATTTGATTGATTTTAAGCAAATTTCCAACTAAAATCTTGAAAATTTGATCTTTTTAGATAAATTTTTTTTTAATTTTGTGCTTTTCAAAAAACCTCAAAAAACAATTAAAAATTGAAGTAAAATTAATTTTTCAACAATTTTT GAAAGGCCGAATTTTTGATTGAAAATTTTCACAATTTGTCCATTTTGTGGTGGGGCTTATTCCGAAAAATCGTTGTTTTTTTTTTCAAAAAAGTTATAAAAACTTTAAAATTGCCATGTAAAATATGTTTATTCTCAGACCTCGTAGGCA CGAAGCAGGCGTAGGTCGCCTCGCAATAAATTTGAAAATCTCAAGAAAAATCAATAAATTTGTGATTAATCAAAAAAATTTAATTTCCTGGTCCCAGCACGAATGCTATTTTTCGAAAAAAAAAAAGAGGCGAGCCTAATATAGACCACG CCCACAAAATGGGCAAAAGTTTGATTTTTCAAAAAATCGAAACAAAAATTTTTCCAATTTTGTGAGATTTTAAAATTTCCGGTTTTTGGAAAATCGAAAAAAAATTTCTCGTTTTTTAATTTTCAAAAAAAATTGTGCCTAAAATTCAAA AAAAAAATCAATACTTTCTCAAAATTTCCAGAAAACAGTCCATTTTCCAGGCACGTTCGAGTCCTTGGACCCCAGCGATCTCGTGTCTCCACAACGAATCGAATATTCACCGGAGAACCACACGGACCGATTCCCGATAAAAATATCACT AATTTCGACGACGAGGATTTTGCCAATTTTATCGATCACTCACTTGTTCACTTATCACTTCGTTAAATTTACCTCCAGTGATTCCAGATAATGAGCCAGTTTTGCATTGAAATTTAGTGCCAAAATATAGAAAATCGCATGATTTAACAT AAAATAGCGTTTCGAATTGAAACAATGGAAAAAAAGTGCTATGATGATTTTTTAACACTTTTAATTGTTCCAATTTGAAGTAAAATCTATTTTCAGATAAATCAACTGATTTTCTATATTCTGCCACTAAAGCTTAAAAACTTGCCCTGC TGTCCTAACCTTCAAATTGTTCCCTGCAAATTTTATTATTCTTGTTTCATATTTTTGCGATTGCTTCGCGAGACCCAAACTCACACATTTACCTGTAAAATATAATCGAATAATTATTTATATATTTTCTGTAAATTTCCTTAGTATACT ATAAATTTTCTGATCTCTCTTCAAAAATCGCTAGAAAAAATAAACAAATGTCGGTTTAAAAATTCCTGGTAATTTACCTTCTATAGAAAATTTTTCGAAAAAAAAACCGAAGAAATTCAGATGGAAATTCCCGATCCCGAACTGCCGGGA ATACCGATTGATCCGCAAGATTTGGAGATTCTAGACACGCCCACACGGTTTTACGAGAAGCTTTTAGTGCGTTTTTCGTGTCGGGACCCGGAAATTTGACATTTTTGGCGCGCGGCTTGTTAGACTCCAAACCTTTTCAAAGATTTTTTT TTCGAATTAAATAACATTCGTGCTTGGGCCCGGAAATTGAATTTTTGATTTGAAAACAATTTTTTTTGAGTCCAAAATTTTCAAAGTTTGTCCATTTTTGGCGCGTGGCCTAGTAGGATCCGCCCCTTCTAAATTTTTTTTGAGCAAGTT TTCTGAAGCATTGATTTCAAAAATTTTTTTTGGAAATTTCTGGTTTATTTTTCCGGTTTTTTTCCGAGTTGCTGTTTAAGTTTGGAGAAATTCCAGAATTTGTCAATTTTTGGGGCGTGGCTTTTTCAGTAAGCACAGTTTTTTTTTTTT GAAAAATTGAAATTTTCGCGGTGCGGTTCAAGAAAAACCACAAAAACTCAATGATTTTTTAACGAAAATTTCAAATTTCTTGCAAGACCTACTGCAATTTCGATTTTTAGAAACTTTTTGAAAAAAATCCGAATTTTCTGATTTAGCCCC GCCCCAAAAATGGAAAGATTTCCGAAAATTCGAACCAAAAGTTCGCAAAAACTTGAATTTCTCTCACACAGATTGACGCGCTAATTTGAATTTTTCCAAAAATAAGCCCCGCCCCAAAAATGGACAAATTTTAAAAATTTTGAACCAAAT AAATTCAATTTTTTTTCGCTTTTTTCCGTTTTCGAACAAAAAATTCTAAAAATATATGGTTCTAGGCGGGGCTCAGGCACCCATCTACCTACTTAAAAATGCGTTAAATTTCAGGAATTAACTGCATCAACCGAACGGCGTCTCGCATTG TGTAGTCTGTATTTGGGCGAAGGAGATCTCGAAAAAAATCTGATCGCTGCGATCCGAGAAAGATCCGAAAAATCCGAGATTGAAGTGACGATTCTGTTGGATTTTTTGCGCGGAACACGGACCAATTCAAGCGGCGAAAGTAGTGTAACA GTGCTGAAACCTATTTCGGAAAAGTCAAAAGTTGGTTTTTTTTGCAAAAAAAAATCGATAAATCGATAAAAACCGACAATTTTGAGAATTTTCATTTCAAATTTGAGTCCCACATGCGCCTTTAAATATGGTGTACTGTAGTTTTAGCTC GAATGTTGAATTTCAAAAATTGAGAATAAAGAAATGTCGTGACGAGACCCACAAATGTTTTGAAAAAAATTTTCAATTTCAAAAAAATGTAAAAAATTGGGAATTTCCCTCCAAAAGTTAAATTGGTTTAGTCACAAACTTTGAAATTTT GAAATAAAATTTTTTTCGGCTAAAAATAAGTATTTTTTAAAAACTATTTTGAAGAAAAAAAGTTAGGTCTCGCCACGATGTATCTTGTATATGTGTATCTAAATTGCCATGTCGTGACGAGACCCTCTCATATTTTACACTGCAACTTTT TCCTCACGAGGGACGAGGAAAAGTGGTTTCTAGGCCATGGCCGAGGGGCCGACAAGTTTCATCGGCCATTTATCTTGCTTTGTTTTCCGCCTGTTTTCTTTCGTTTTTCACAGCTTTTTCCCATTTTTTCTTATTAAAACTGATAAATAA ATATTTTTGCAGATGCCAAAACGATTTTCAAGTAAAAAAATCATGTATTCAGTGGGCAAGCAGCGGTGAAAGTGGGCATTGTAATATGATGGATTACGGGAATACAAAACCTAAACTTTTTCTGAAACATGATACATATGATGCTTAAAT GCTGAGACTACCTGATTTTCATAACGAGACCGCTGAAAAAGTTTTGAGGTTTTCAAAATTCAACTTTTTGTGCGAAAATCTCGACTTTTTCACCGAAAAAGTTGAATTTTGGAAACCTCAAAACTTTTTCAGCGGTCTTGATATGAAAAT CAGGTAGCTTCAGCATCTAAGCAGCATATGTATCATGTTAAAGAAAAAGTTTAGGTTTTGTATTCCTGTAATCCATCATATTACATTGCCCACTTTCACCGCTGCTTGCCCACTGAATACATAATTTTTTCACTTGGAAATTGTTTTAGC Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

15 Example: C. elegans (I: 43,500-52,050) GAAGAAATGGAGCATTTGCGCTCCATCACACTCTCAGACAATTTCATTTTCCACATCCTATATATATTTTGGTTTTTCTGTCGTATTTTGTTTTAATTTATTGGTATTTCGTTCAAAAATAATTATTTTGACTGTATTTTTGGTTGCATA CATGTAGAACTGCTGTTTTTTAAGATATTCTGCCCATTCAAGTTTTTCAGTGTAAAATTGATATATTTCATTCCAACTGAAAATGAGATCGAAACGATGGAAAACCTCGGATATTACTGATTATGGAAAGAAGAGAAAAGAATCGGAAAG TTGTGGATCAAGTTCACCGATTCTCGAAACACAGTCATCTGGCGGTGCGGAACTTGACGAAGTTACTGAGGATGAATATTCTAGTAATTCGAGCAGTAATGAAACTAGCGACGAAGAGGAAAACTCAGAAGTACCAAATGTCTTATCTAT AACAGAAAGAGGTAAGAATTGCGTCTTCTAGTGATCATACTTTTCGCCAGATTCCCTAATGTAATATATTTTGTTGTAGAGAAAAGTTGGCAAAAGTTAACGGAAAACGATTTGGGACGAATTCGTTTCATCTTGAAGTACACTAGCAAT ACTAAAAAATGCGTGAACGAGTATTTTCAATATAATCATGGGCAAAACAATGAAATTATGAAAAGTCTATTATTGGATACCGATGGAACTATGACTGCAAAGGCTTGTTCGGAATGTGCCTACGATTTGAATCAGTAAGTTACTCTCTCG ATTTATTCCCAAAATTAATATGTGCTTCAGGTGCCACTGCAAAAAACCGCTTCGCTTCATCAATGCTCCGTGTGGTTGGTTTGCTATTCAAAACTATAAATAGTTCACTGTTTCCGTTCAGAGGTCATCAACCAAGTTCTTCATGTTGAA AATGCGGAGCCCACCAGGATCAACCATGTAATCGCAACACTCTTCCGGAATCACATTGGCGAGATTTTGTTGGTCCACTCTATTTCTGTGCGAGAACTGTGATAAAACTAGTATTTTCAGCACAAAGGCTCGAACTGCGGAAGCTCGCGC ATCTGAAGAAGCTCAAATCAGGATTCAAATCCAAGACAACTCGAACGCATTCCAAAGATCGTATCATAACGATCCACAACCTTCATCAGCCGAAGAACATGAGGAAGATATCGTGGTGGATGGCTGAGTACGGAGCTCAAATGCCTTAAG GCGAAACAATTGGTTTTTTAATTTGCTGGTTATCATGTTAGATTTTGAACGTGTTAGGTCTTTCAATTGTTTTTTTTTTTCGAAATGTTGTTGTTCTAATAAATTTGTTTTATTTAATCAAACGTTTTTTAGTCTACTACGGGCGTGAAG CCAGATATCAGTGGTATCTTCTTATCAGAAGCTGAATCATTTCCGGTTGACAATGTTTGAAGGACATAAGAAAGGCTGTGTTACTGATTTCGACCATTGATTTGTTTATATATGGATATGTTCCACTGCCTTTTGGAAAGGCAGTATTCC CGGTATATATGGGCCTAATACGGAATCTAAAATAACCTGACACAAACCTGACGTTGACCTGTTGCCGGCCCGCGGCGGCTTAGTGTCAACTTGACAGCGGGTCGCGATTTCACCTGCCAGTTGTTCTCCATTCAGCAGCCAGCGACCTGC TGGCAGGTTGCCACTAACCTGACGCGGTTTACCTGTGTTATCGGCGCGTGCATAGCTTAGTGGTTTCAGGAAATGATGCTAGTAATCAGAAGATCGGGGTTCGGGAAACGGCAGGGGCTTGAAGGTTAGGTTCTATGAAGCAGGGCGAAG GGTTGACAAGGAGAGGCAATAAGCAAGTAGTAGGGGTTCTCTAGAAAACATTTTTGTCTTTAATATGCGTTTCCTACTGATTTATTATTGATATTTGGATCCCCTTTTCTAGAAAAAAAAATCAGAATCAGCAGAAAAATTTGAGAAAAA GTCATAGCAAATCAGAGTTGGTCAGAGTAAATCAGAGCTAGTCATAGTAAATCATAGCTAGTCAGAGAATATCAGAGTTAATCAGGGTAATAAGTAGACCTAGTCATAGTAAATCAGAGCTAGGCATAGTAAAGCGTGGTTACTCCGAGT AAAACCACACTTGCACCGAACTGCGGTTAGTGTGCTTTACCATTATGTAACTCCGCTTTTTACTCTGAGTTAGTATGATATGGTTTGTCTGAGCTGTGGTTGGGCTTCGCGGGAAACTTGAATAATTCGAGACAAAATCTAATTTTAGCG AATTTTCTTTAATTTCTTTGAGGTTTCTACGACAGAACTCGAAAAATTTCGGGTTTTAATGTTTACACATTTTATTTAAAATTGAATAATCAACTGCGGGACTCCTCGAAAATCACATGCTCATTTAAATTTTGAAGTTCAAACCTCAAA AAACGCGCAAAAACCAAATTCAGCTAGGATATCAAATTTATGATTGAAATCTATATTTTGATGCGGTGTTTCTGAAGTTTTCGCGATAAAATCCGAATAATAATTCCACGTACCGTATATTCTCTATCTAATTTCCAGGTCATTTTTTAA TGCAGCACTATTAGAGACTGTCGTACTACTGGAGACTGCAGCATTAATTTTCGAACGGCTACTGTCAATTATAGATCACTAGTATTTAGTCACAAAAGCTAATTTTTTAAGCAGAAATTCATAAAAATGTTTTCAATATTGCGAACTTTT GTAACAAAAAGACCCAGTAATTCAATTACTTTCGTAAATTATCAAAAAATCATCAAAAATATACAAAAAAATACCAAAAAATATTGAAACTTTCAAGTGACTCTTTCAATAGAAAATGGGGTGCAGCACTAATAGAGACTGCTGCACTAT TTTTCGGACCCTTTTTGAATGCAGCACTATTAGAGACTGCAGTATTTACTACTGGAGATGCAGCACTAATAGAGAATATACGGTATATACGTAATATATTCTTGCAGAAAAAAGTACGATTATCAATGAAAAATAGCTGATAAGAGGCTT TTGTTTGAACTAACAGACGGAACGACTCCGGTTTAGTTCAAAAAATTCTAAAAACACGTTGTGTCAGGCTGTCTCATTGCGGTTTGATCTACGAAAAATGCGGGAATATTTTTCCAGAAAAATTGTGACGTCAGCACGCTCTTAACCATG CGAAACGAGATGAGATGTCTGCGTCTCTTTTCCCGCATTTTTCGAAGATCAAAACGAATGGGACTTTCTGACTCCACGTGTAAAAAGGGGTTACGACGGACCCTGGCCTAGAAATTAGGCGTGAAAATTCTCGGGCACTGGATGTAGTGA ACGCCCGCGATGAAAAATTGGGGGAAAATTAGGCTTTCTTTGCGAGAAAGATTAATTAAAAATGTTTTCCTTTGTCGAAAATAATTTTTAAAAAACACACCACGTGTATTCAGCTCGACCAACGCCTCGAAAATTTTCAAAAAAGGCGGG AAAAATTAGTTGAATTCGCCAAGAGGAATTTCACCGCAGCGCGTGCAAAAATTTCAGCATTTGCGCGTGACGGTGTTTGCACAAATTACACCGAATGGTCGAGCTGAAAACACGTGCACACTTTTAAATAAAACTAGAAAATAAATCCCA GGCCTGCAAATATTGCACACAAAACCGTAATCCCCTTCGCGCTAAACAACACGCGCAACGATGCTCCGCTTGGGGACAAGGAAAAATTAATTTAACTCGGGATTTTCATTAAAAAATTAGGTTTTTAGTTAATTTTTCGATGTTTTCACT GCGAAAAAGTGTTAAAATAACGATTTTTCAACCTATTTTCAATTAATCCGTGCAAAAAATCGTGTATTTCTCGAGTTTTGAAAGAAATTTATGAAAATCGGCATTTTTAATAATGGTTTTTCAAATAAAAATATAATTTTTCGGTGCAGA AAAGTCGTTGCTCGTACAGTTTTTTTAAAGCATTTTCACATCAAAATCCTCCATTTTTCCAGTAAATCGATATGGAGTGCGACGAGACAAAGCTGAGCGACGGCGCAAGCGGCTGGGTGCCGAGTATCCCGACAGATATCGATTCAAAAG ACACACCGTTGCTCGATATATCTTCTCAGGCGATTTGGGCGCTTTCCAGTTGTAAAAGCGGTAAATTTTCCGACTTTCAAGGGAGAAAAGTGTAGAAAAATCGAAATTACTTCTTAAAAATCTCGTAAAAATCGAATTCTTTCAGGATTC GGCATCGACGAGCTCCTATCCGACAGTGTTGAGAAATATTGGCAAAGCGATGGCCCGCAGCCGCACACGATTCTTCTAGAATTCCAGAAAAAGACCGACGTGGCTATGATGATGTTCTATTTGGATTTTAAAAACGACGAGTCTTATACA CCGTCAAAGTTAGCATTTTTGGCTTTTTCAAACGAAAAAATACAATGAAACACTGAATATCTAGTTTTTTTCTCAATTTTTGCCTAAAAAACGGCGATTTTTCACTAGCTTTTCAATTAAAATTTGAACAAAAAGTTTTTTAAAGGAAAA ACATGAATTTCTAGCTTTTTCAGAGGTTTTCTATTAAAAAATAGAGATTTTTGTGATATCTGACTGAAAAATTACCAAACTGTCGATTTTTTTAAACTATTTTTCACTTAAAATCTGCAATTTTTTTTTTCGAGGAAACATGTGAATTTC AAGCTTTTTCAGAGATTTTCTATGAAAAAGGTTCGTGCCGAGACCCATGTGCTTTTAAACTTCAGAATTTTCCCAATTTTGAAATTAAAAAGAGAATGAAAATTGATTTTCATGGAAAAATGCGTTTTTGGCCCAAAACCTCCAAAAAGT ACAAATATAGGTCGACTTTCAACTGTTTTAGATCAATTTTTTTGCAGAATTCAAGTAAAAATGGGTTCATCTCACCAGGATATATTTTTCCGTCAAACACAAACATTCAACGAGCCCCAGGGATGGACATTTATCGATTTACGCGACAAA AATGGGAAACCGAATCGCGTTTTTTGGCTTCAAGTACAAGTTATTCAGAATCATCAAAATGGGAGAGATACTCATATAAGGTAGAGGAATTGAGAATTTCAGAACGAAAATTGCCGAAAAAATGAAATTTTAGCGAATTTGAGTCGGAAA TTTCGAAATTTGATTGATTTTAAGCAAATTTCCAACTAAAATCTTGAAAATTTGATCTTTTTAGATAAATTTTTTTTTAATTTTGTGCTTTTCAAAAAACCTCAAAAAACAATTAAAAATTGAAGTAAAATTAATTTTTCAACAATTTTT GAAAGGCCGAATTTTTGATTGAAAATTTTCACAATTTGTCCATTTTGTGGTGGGGCTTATTCCGAAAAATCGTTGTTTTTTTTTTCAAAAAAGTTATAAAAACTTTAAAATTGCCATGTAAAATATGTTTATTCTCAGACCTCGTAGGCA CGAAGCAGGCGTAGGTCGCCTCGCAATAAATTTGAAAATCTCAAGAAAAATCAATAAATTTGTGATTAATCAAAAAAATTTAATTTCCTGGTCCCAGCACGAATGCTATTTTTCGAAAAAAAAAAAGAGGCGAGCCTAATATAGACCACG CCCACAAAATGGGCAAAAGTTTGATTTTTCAAAAAATCGAAACAAAAATTTTTCCAATTTTGTGAGATTTTAAAATTTCCGGTTTTTGGAAAATCGAAAAAAAATTTCTCGTTTTTTAATTTTCAAAAAAAATTGTGCCTAAAATTCAAA AAAAAAATCAATACTTTCTCAAAATTTCCAGAAAACAGTCCATTTTCCAGGCACGTTCGAGTCCTTGGACCCCAGCGATCTCGTGTCTCCACAACGAATCGAATATTCACCGGAGAACCACACGGACCGATTCCCGATAAAAATATCACT AATTTCGACGACGAGGATTTTGCCAATTTTATCGATCACTCACTTGTTCACTTATCACTTCGTTAAATTTACCTCCAGTGATTCCAGATAATGAGCCAGTTTTGCATTGAAATTTAGTGCCAAAATATAGAAAATCGCATGATTTAACAT AAAATAGCGTTTCGAATTGAAACAATGGAAAAAAAGTGCTATGATGATTTTTTAACACTTTTAATTGTTCCAATTTGAAGTAAAATCTATTTTCAGATAAATCAACTGATTTTCTATATTCTGCCACTAAAGCTTAAAAACTTGCCCTGC TGTCCTAACCTTCAAATTGTTCCCTGCAAATTTTATTATTCTTGTTTCATATTTTTGCGATTGCTTCGCGAGACCCAAACTCACACATTTACCTGTAAAATATAATCGAATAATTATTTATATATTTTCTGTAAATTTCCTTAGTATACT ATAAATTTTCTGATCTCTCTTCAAAAATCGCTAGAAAAAATAAACAAATGTCGGTTTAAAAATTCCTGGTAATTTACCTTCTATAGAAAATTTTTCGAAAAAAAAACCGAAGAAATTCAGATGGAAATTCCCGATCCCGAACTGCCGGGA ATACCGATTGATCCGCAAGATTTGGAGATTCTAGACACGCCCACACGGTTTTACGAGAAGCTTTTAGTGCGTTTTTCGTGTCGGGACCCGGAAATTTGACATTTTTGGCGCGCGGCTTGTTAGACTCCAAACCTTTTCAAAGATTTTTTT TTCGAATTAAATAACATTCGTGCTTGGGCCCGGAAATTGAATTTTTGATTTGAAAACAATTTTTTTTGAGTCCAAAATTTTCAAAGTTTGTCCATTTTTGGCGCGTGGCCTAGTAGGATCCGCCCCTTCTAAATTTTTTTTGAGCAAGTT TTCTGAAGCATTGATTTCAAAAATTTTTTTTGGAAATTTCTGGTTTATTTTTCCGGTTTTTTTCCGAGTTGCTGTTTAAGTTTGGAGAAATTCCAGAATTTGTCAATTTTTGGGGCGTGGCTTTTTCAGTAAGCACAGTTTTTTTTTTTT GAAAAATTGAAATTTTCGCGGTGCGGTTCAAGAAAAACCACAAAAACTCAATGATTTTTTAACGAAAATTTCAAATTTCTTGCAAGACCTACTGCAATTTCGATTTTTAGAAACTTTTTGAAAAAAATCCGAATTTTCTGATTTAGCCCC GCCCCAAAAATGGAAAGATTTCCGAAAATTCGAACCAAAAGTTCGCAAAAACTTGAATTTCTCTCACACAGATTGACGCGCTAATTTGAATTTTTCCAAAAATAAGCCCCGCCCCAAAAATGGACAAATTTTAAAAATTTTGAACCAAAT AAATTCAATTTTTTTTCGCTTTTTTCCGTTTTCGAACAAAAAATTCTAAAAATATATGGTTCTAGGCGGGGCTCAGGCACCCATCTACCTACTTAAAAATGCGTTAAATTTCAGGAATTAACTGCATCAACCGAACGGCGTCTCGCATTG TGTAGTCTGTATTTGGGCGAAGGAGATCTCGAAAAAAATCTGATCGCTGCGATCCGAGAAAGATCCGAAAAATCCGAGATTGAAGTGACGATTCTGTTGGATTTTTTGCGCGGAACACGGACCAATTCAAGCGGCGAAAGTAGTGTAACA GTGCTGAAACCTATTTCGGAAAAGTCAAAAGTTGGTTTTTTTTGCAAAAAAAAATCGATAAATCGATAAAAACCGACAATTTTGAGAATTTTCATTTCAAATTTGAGTCCCACATGCGCCTTTAAATATGGTGTACTGTAGTTTTAGCTC GAATGTTGAATTTCAAAAATTGAGAATAAAGAAATGTCGTGACGAGACCCACAAATGTTTTGAAAAAAATTTTCAATTTCAAAAAAATGTAAAAAATTGGGAATTTCCCTCCAAAAGTTAAATTGGTTTAGTCACAAACTTTGAAATTTT GAAATAAAATTTTTTTCGGCTAAAAATAAGTATTTTTTAAAAACTATTTTGAAGAAAAAAAGTTAGGTCTCGCCACGATGTATCTTGTATATGTGTATCTAAATTGCCATGTCGTGACGAGACCCTCTCATATTTTACACTGCAACTTTT TCCTCACGAGGGACGAGGAAAAGTGGTTTCTAGGCCATGGCCGAGGGGCCGACAAGTTTCATCGGCCATTTATCTTGCTTTGTTTTCCGCCTGTTTTCTTTCGTTTTTCACAGCTTTTTCCCATTTTTTCTTATTAAAACTGATAAATAA ATATTTTTGCAGATGCCAAAACGATTTTCAAGTAAAAAAATCATGTATTCAGTGGGCAAGCAGCGGTGAAAGTGGGCATTGTAATATGATGGATTACGGGAATACAAAACCTAAACTTTTTCTGAAACATGATACATATGATGCTTAAAT GCTGAGACTACCTGATTTTCATAACGAGACCGCTGAAAAAGTTTTGAGGTTTTCAAAATTCAACTTTTTGTGCGAAAATCTCGACTTTTTCACCGAAAAAGTTGAATTTTGGAAACCTCAAAACTTTTTCAGCGGTCTTGATATGAAAAT CAGGTAGCTTCAGCATCTAAGCAGCATATGTATCATGTTAAAGAAAAAGTTTAGGTTTTGTATTCCTGTAATCCATCATATTACATTGCCCACTTTCACCGCTGCTTGCCCACTGAATACATAATTTTTTCACTTGGAAATTGTTTTAGC Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

16 Example: C. elegans (I: 43,500-52,050) GAAGAAATGGAGCATTTGCGCTCCATCACACTCTCAGACAATTTCATTTTCCACATCCTATATATATTTTGGTTTTTCTGTCGTATTTTGTTTTAATTTATTGGTATTTCGTTCAAAAATAATTATTTTGACTGTATTTTTGGTTGCATA CATGTAGAACTGCTGTTTTTTAAGATATTCTGCCCATTCAAGTTTTTCAGTGTAAAATTGATATATTTCATTCCAACTGAAAATGAGATCGAAACGATGGAAAACCTCGGATATTACTGATTATGGAAAGAAGAGAAAAGAATCGGAAAG TTGTGGATCAAGTTCACCGATTCTCGAAACACAGTCATCTGGCGGTGCGGAACTTGACGAAGTTACTGAGGATGAATATTCTAGTAATTCGAGCAGTAATGAAACTAGCGACGAAGAGGAAAACTCAGAAGTACCAAATGTCTTATCTAT AACAGAAAGAGGTAAGAATTGCGTCTTCTAGTGATCATACTTTTCGCCAGATTCCCTAATGTAATATATTTTGTTGTAGAGAAAAGTTGGCAAAAGTTAACGGAAAACGATTTGGGACGAATTCGTTTCATCTTGAAGTACACTAGCAAT ACTAAAAAATGCGTGAACGAGTATTTTCAATATAATCATGGGCAAAACAATGAAATTATGAAAAGTCTATTATTGGATACCGATGGAACTATGACTGCAAAGGCTTGTTCGGAATGTGCCTACGATTTGAATCAGTAAGTTACTCTCTCG ATTTATTCCCAAAATTAATATGTGCTTCAGGTGCCACTGCAAAAAACCGCTTCGCTTCATCAATGCTCCGTGTGGTTGGTTTGCTATTCAAAACTATAAATAGTTCACTGTTTCCGTTCAGAGGTCATCAACCAAGTTCTTCATGTTGAA AATGCGGAGCCCACCAGGATCAACCATGTAATCGCAACACTCTTCCGGAATCACATTGGCGAGATTTTGTTGGTCCACTCTATTTCTGTGCGAGAACTGTGATAAAACTAGTATTTTCAGCACAAAGGCTCGAACTGCGGAAGCTCGCGC ATCTGAAGAAGCTCAAATCAGGATTCAAATCCAAGACAACTCGAACGCATTCCAAAGATCGTATCATAACGATCCACAACCTTCATCAGCCGAAGAACATGAGGAAGATATCGTGGTGGATGGCTGAGTACGGAGCTCAAATGCCTTAAG GCGAAACAATTGGTTTTTTAATTTGCTGGTTATCATGTTAGATTTTGAACGTGTTAGGTCTTTCAATTGTTTTTTTTTTTCGAAATGTTGTTGTTCTAATAAATTTGTTTTATTTAATCAAACGTTTTTTAGTCTACTACGGGCGTGAAG CCAGATATCAGTGGTATCTTCTTATCAGAAGCTGAATCATTTCCGGTTGACAATGTTTGAAGGACATAAGAAAGGCTGTGTTACTGATTTCGACCATTGATTTGTTTATATATGGATATGTTCCACTGCCTTTTGGAAAGGCAGTATTCC CGGTATATATGGGCCTAATACGGAATCTAAAATAACCTGACACAAACCTGACGTTGACCTGTTGCCGGCCCGCGGCGGCTTAGTGTCAACTTGACAGCGGGTCGCGATTTCACCTGCCAGTTGTTCTCCATTCAGCAGCCAGCGACCTGC TGGCAGGTTGCCACTAACCTGACGCGGTTTACCTGTGTTATCGGCGCGTGCATAGCTTAGTGGTTTCAGGAAATGATGCTAGTAATCAGAAGATCGGGGTTCGGGAAACGGCAGGGGCTTGAAGGTTAGGTTCTATGAAGCAGGGCGAAG GGTTGACAAGGAGAGGCAATAAGCAAGTAGTAGGGGTTCTCTAGAAAACATTTTTGTCTTTAATATGCGTTTCCTACTGATTTATTATTGATATTTGGATCCCCTTTTCTAGAAAAAAAAATCAGAATCAGCAGAAAAATTTGAGAAAAA GTCATAGCAAATCAGAGTTGGTCAGAGTAAATCAGAGCTAGTCATAGTAAATCATAGCTAGTCAGAGAATATCAGAGTTAATCAGGGTAATAAGTAGACCTAGTCATAGTAAATCAGAGCTAGGCATAGTAAAGCGTGGTTACTCCGAGT AAAACCACACTTGCACCGAACTGCGGTTAGTGTGCTTTACCATTATGTAACTCCGCTTTTTACTCTGAGTTAGTATGATATGGTTTGTCTGAGCTGTGGTTGGGCTTCGCGGGAAACTTGAATAATTCGAGACAAAATCTAATTTTAGCG AATTTTCTTTAATTTCTTTGAGGTTTCTACGACAGAACTCGAAAAATTTCGGGTTTTAATGTTTACACATTTTATTTAAAATTGAATAATCAACTGCGGGACTCCTCGAAAATCACATGCTCATTTAAATTTTGAAGTTCAAACCTCAAA AAACGCGCAAAAACCAAATTCAGCTAGGATATCAAATTTATGATTGAAATCTATATTTTGATGCGGTGTTTCTGAAGTTTTCGCGATAAAATCCGAATAATAATTCCACGTACCGTATATTCTCTATCTAATTTCCAGGTCATTTTTTAA TGCAGCACTATTAGAGACTGTCGTACTACTGGAGACTGCAGCATTAATTTTCGAACGGCTACTGTCAATTATAGATCACTAGTATTTAGTCACAAAAGCTAATTTTTTAAGCAGAAATTCATAAAAATGTTTTCAATATTGCGAACTTTT GTAACAAAAAGACCCAGTAATTCAATTACTTTCGTAAATTATCAAAAAATCATCAAAAATATACAAAAAAATACCAAAAAATATTGAAACTTTCAAGTGACTCTTTCAATAGAAAATGGGGTGCAGCACTAATAGAGACTGCTGCACTAT TTTTCGGACCCTTTTTGAATGCAGCACTATTAGAGACTGCAGTATTTACTACTGGAGATGCAGCACTAATAGAGAATATACGGTATATACGTAATATATTCTTGCAGAAAAAAGTACGATTATCAATGAAAAATAGCTGATAAGAGGCTT TTGTTTGAACTAACAGACGGAACGACTCCGGTTTAGTTCAAAAAATTCTAAAAACACGTTGTGTCAGGCTGTCTCATTGCGGTTTGATCTACGAAAAATGCGGGAATATTTTTCCAGAAAAATTGTGACGTCAGCACGCTCTTAACCATG CGAAACGAGATGAGATGTCTGCGTCTCTTTTCCCGCATTTTTCGAAGATCAAAACGAATGGGACTTTCTGACTCCACGTGTAAAAAGGGGTTACGACGGACCCTGGCCTAGAAATTAGGCGTGAAAATTCTCGGGCACTGGATGTAGTGA ACGCCCGCGATGAAAAATTGGGGGAAAATTAGGCTTTCTTTGCGAGAAAGATTAATTAAAAATGTTTTCCTTTGTCGAAAATAATTTTTAAAAAACACACCACGTGTATTCAGCTCGACCAACGCCTCGAAAATTTTCAAAAAAGGCGGG AAAAATTAGTTGAATTCGCCAAGAGGAATTTCACCGCAGCGCGTGCAAAAATTTCAGCATTTGCGCGTGACGGTGTTTGCACAAATTACACCGAATGGTCGAGCTGAAAACACGTGCACACTTTTAAATAAAACTAGAAAATAAATCCCA GGCCTGCAAATATTGCACACAAAACCGTAATCCCCTTCGCGCTAAACAACACGCGCAACGATGCTCCGCTTGGGGACAAGGAAAAATTAATTTAACTCGGGATTTTCATTAAAAAATTAGGTTTTTAGTTAATTTTTCGATGTTTTCACT GCGAAAAAGTGTTAAAATAACGATTTTTCAACCTATTTTCAATTAATCCGTGCAAAAAATCGTGTATTTCTCGAGTTTTGAAAGAAATTTATGAAAATCGGCATTTTTAATAATGGTTTTTCAAATAAAAATATAATTTTTCGGTGCAGA AAAGTCGTTGCTCGTACAGTTTTTTTAAAGCATTTTCACATCAAAATCCTCCATTTTTCCAGTAAATCGATATGGAGTGCGACGAGACAAAGCTGAGCGACGGCGCAAGCGGCTGGGTGCCGAGTATCCCGACAGATATCGATTCAAAAG ACACACCGTTGCTCGATATATCTTCTCAGGCGATTTGGGCGCTTTCCAGTTGTAAAAGCGGTAAATTTTCCGACTTTCAAGGGAGAAAAGTGTAGAAAAATCGAAATTACTTCTTAAAAATCTCGTAAAAATCGAATTCTTTCAGGATTC GGCATCGACGAGCTCCTATCCGACAGTGTTGAGAAATATTGGCAAAGCGATGGCCCGCAGCCGCACACGATTCTTCTAGAATTCCAGAAAAAGACCGACGTGGCTATGATGATGTTCTATTTGGATTTTAAAAACGACGAGTCTTATACA CCGTCAAAGTTAGCATTTTTGGCTTTTTCAAACGAAAAAATACAATGAAACACTGAATATCTAGTTTTTTTCTCAATTTTTGCCTAAAAAACGGCGATTTTTCACTAGCTTTTCAATTAAAATTTGAACAAAAAGTTTTTTAAAGGAAAA ACATGAATTTCTAGCTTTTTCAGAGGTTTTCTATTAAAAAATAGAGATTTTTGTGATATCTGACTGAAAAATTACCAAACTGTCGATTTTTTTAAACTATTTTTCACTTAAAATCTGCAATTTTTTTTTTCGAGGAAACATGTGAATTTC AAGCTTTTTCAGAGATTTTCTATGAAAAAGGTTCGTGCCGAGACCCATGTGCTTTTAAACTTCAGAATTTTCCCAATTTTGAAATTAAAAAGAGAATGAAAATTGATTTTCATGGAAAAATGCGTTTTTGGCCCAAAACCTCCAAAAAGT ACAAATATAGGTCGACTTTCAACTGTTTTAGATCAATTTTTTTGCAGAATTCAAGTAAAAATGGGTTCATCTCACCAGGATATATTTTTCCGTCAAACACAAACATTCAACGAGCCCCAGGGATGGACATTTATCGATTTACGCGACAAA AATGGGAAACCGAATCGCGTTTTTTGGCTTCAAGTACAAGTTATTCAGAATCATCAAAATGGGAGAGATACTCATATAAGGTAGAGGAATTGAGAATTTCAGAACGAAAATTGCCGAAAAAATGAAATTTTAGCGAATTTGAGTCGGAAA TTTCGAAATTTGATTGATTTTAAGCAAATTTCCAACTAAAATCTTGAAAATTTGATCTTTTTAGATAAATTTTTTTTTAATTTTGTGCTTTTCAAAAAACCTCAAAAAACAATTAAAAATTGAAGTAAAATTAATTTTTCAACAATTTTT GAAAGGCCGAATTTTTGATTGAAAATTTTCACAATTTGTCCATTTTGTGGTGGGGCTTATTCCGAAAAATCGTTGTTTTTTTTTTCAAAAAAGTTATAAAAACTTTAAAATTGCCATGTAAAATATGTTTATTCTCAGACCTCGTAGGCA CGAAGCAGGCGTAGGTCGCCTCGCAATAAATTTGAAAATCTCAAGAAAAATCAATAAATTTGTGATTAATCAAAAAAATTTAATTTCCTGGTCCCAGCACGAATGCTATTTTTCGAAAAAAAAAAAGAGGCGAGCCTAATATAGACCACG CCCACAAAATGGGCAAAAGTTTGATTTTTCAAAAAATCGAAACAAAAATTTTTCCAATTTTGTGAGATTTTAAAATTTCCGGTTTTTGGAAAATCGAAAAAAAATTTCTCGTTTTTTAATTTTCAAAAAAAATTGTGCCTAAAATTCAAA AAAAAAATCAATACTTTCTCAAAATTTCCAGAAAACAGTCCATTTTCCAGGCACGTTCGAGTCCTTGGACCCCAGCGATCTCGTGTCTCCACAACGAATCGAATATTCACCGGAGAACCACACGGACCGATTCCCGATAAAAATATCACT AATTTCGACGACGAGGATTTTGCCAATTTTATCGATCACTCACTTGTTCACTTATCACTTCGTTAAATTTACCTCCAGTGATTCCAGATAATGAGCCAGTTTTGCATTGAAATTTAGTGCCAAAATATAGAAAATCGCATGATTTAACAT AAAATAGCGTTTCGAATTGAAACAATGGAAAAAAAGTGCTATGATGATTTTTTAACACTTTTAATTGTTCCAATTTGAAGTAAAATCTATTTTCAGATAAATCAACTGATTTTCTATATTCTGCCACTAAAGCTTAAAAACTTGCCCTGC TGTCCTAACCTTCAAATTGTTCCCTGCAAATTTTATTATTCTTGTTTCATATTTTTGCGATTGCTTCGCGAGACCCAAACTCACACATTTACCTGTAAAATATAATCGAATAATTATTTATATATTTTCTGTAAATTTCCTTAGTATACT ATAAATTTTCTGATCTCTCTTCAAAAATCGCTAGAAAAAATAAACAAATGTCGGTTTAAAAATTCCTGGTAATTTACCTTCTATAGAAAATTTTTCGAAAAAAAAACCGAAGAAATTCAGATGGAAATTCCCGATCCCGAACTGCCGGGA ATACCGATTGATCCGCAAGATTTGGAGATTCTAGACACGCCCACACGGTTTTACGAGAAGCTTTTAGTGCGTTTTTCGTGTCGGGACCCGGAAATTTGACATTTTTGGCGCGCGGCTTGTTAGACTCCAAACCTTTTCAAAGATTTTTTT TTCGAATTAAATAACATTCGTGCTTGGGCCCGGAAATTGAATTTTTGATTTGAAAACAATTTTTTTTGAGTCCAAAATTTTCAAAGTTTGTCCATTTTTGGCGCGTGGCCTAGTAGGATCCGCCCCTTCTAAATTTTTTTTGAGCAAGTT TTCTGAAGCATTGATTTCAAAAATTTTTTTTGGAAATTTCTGGTTTATTTTTCCGGTTTTTTTCCGAGTTGCTGTTTAAGTTTGGAGAAATTCCAGAATTTGTCAATTTTTGGGGCGTGGCTTTTTCAGTAAGCACAGTTTTTTTTTTTT GAAAAATTGAAATTTTCGCGGTGCGGTTCAAGAAAAACCACAAAAACTCAATGATTTTTTAACGAAAATTTCAAATTTCTTGCAAGACCTACTGCAATTTCGATTTTTAGAAACTTTTTGAAAAAAATCCGAATTTTCTGATTTAGCCCC GCCCCAAAAATGGAAAGATTTCCGAAAATTCGAACCAAAAGTTCGCAAAAACTTGAATTTCTCTCACACAGATTGACGCGCTAATTTGAATTTTTCCAAAAATAAGCCCCGCCCCAAAAATGGACAAATTTTAAAAATTTTGAACCAAAT AAATTCAATTTTTTTTCGCTTTTTTCCGTTTTCGAACAAAAAATTCTAAAAATATATGGTTCTAGGCGGGGCTCAGGCACCCATCTACCTACTTAAAAATGCGTTAAATTTCAGGAATTAACTGCATCAACCGAACGGCGTCTCGCATTG TGTAGTCTGTATTTGGGCGAAGGAGATCTCGAAAAAAATCTGATCGCTGCGATCCGAGAAAGATCCGAAAAATCCGAGATTGAAGTGACGATTCTGTTGGATTTTTTGCGCGGAACACGGACCAATTCAAGCGGCGAAAGTAGTGTAACA GTGCTGAAACCTATTTCGGAAAAGTCAAAAGTTGGTTTTTTTTGCAAAAAAAAATCGATAAATCGATAAAAACCGACAATTTTGAGAATTTTCATTTCAAATTTGAGTCCCACATGCGCCTTTAAATATGGTGTACTGTAGTTTTAGCTC GAATGTTGAATTTCAAAAATTGAGAATAAAGAAATGTCGTGACGAGACCCACAAATGTTTTGAAAAAAATTTTCAATTTCAAAAAAATGTAAAAAATTGGGAATTTCCCTCCAAAAGTTAAATTGGTTTAGTCACAAACTTTGAAATTTT GAAATAAAATTTTTTTCGGCTAAAAATAAGTATTTTTTAAAAACTATTTTGAAGAAAAAAAGTTAGGTCTCGCCACGATGTATCTTGTATATGTGTATCTAAATTGCCATGTCGTGACGAGACCCTCTCATATTTTACACTGCAACTTTT TCCTCACGAGGGACGAGGAAAAGTGGTTTCTAGGCCATGGCCGAGGGGCCGACAAGTTTCATCGGCCATTTATCTTGCTTTGTTTTCCGCCTGTTTTCTTTCGTTTTTCACAGCTTTTTCCCATTTTTTCTTATTAAAACTGATAAATAA ATATTTTTGCAGATGCCAAAACGATTTTCAAGTAAAAAAATCATGTATTCAGTGGGCAAGCAGCGGTGAAAGTGGGCATTGTAATATGATGGATTACGGGAATACAAAACCTAAACTTTTTCTGAAACATGATACATATGATGCTTAAAT GCTGAGACTACCTGATTTTCATAACGAGACCGCTGAAAAAGTTTTGAGGTTTTCAAAATTCAACTTTTTGTGCGAAAATCTCGACTTTTTCACCGAAAAAGTTGAATTTTGGAAACCTCAAAACTTTTTCAGCGGTCTTGATATGAAAAT CAGGTAGCTTCAGCATCTAAGCAGCATATGTATCATGTTAAAGAAAAAGTTTAGGTTTTGTATTCCTGTAATCCATCATATTACATTGCCCACTTTCACCGCTGCTTGCCCACTGAATACATAATTTTTTCACTTGGAAATTGTTTTAGC Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

17 Quasi-Standard: Hidden Markov Models Model sequence content: One state per segment type Allow only plausible transitions Content statistics at each state Derived from known genes Prediction: Given DNA, find most likely state sequences Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

18 Quasi-Standard: Hidden Markov Models Model sequence content: One state per segment type Allow only plausible transitions Content statistics at each state Derived from known genes Prediction: Given DNA, find most likely state sequences Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

19 Quasi-Standard: Hidden Markov Models Model sequence content: One state per segment type Allow only plausible transitions Content statistics at each state Derived from known genes Prediction: Given DNA, find most likely state sequences Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

20 Quasi-Standard: Hidden Markov Models Model sequence content: One state per segment type Allow only plausible transitions Content statistics at each state Derived from known genes Prediction: Given DNA, find most likely state sequences p(x, y) = L 1 i=1 p(x i y i )p(y i+1 y i ) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

21 Generative Models Hidden Markov Models [Rabiner, 1989] State sequence treated as (first order) Markov chain No direct dependencies between observations p(x, y) = i p(x i y i )p(y i y i 1 ) Y 1 Y 2... Y n X 1 X 2... Maximum likelihood estimation: max θ X n N log p(x n, y n ) n=1 Decoding: Viterbi/dynamic programming (DP) algorithms Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

22 Generative Models Hidden Markov Models [Rabiner, 1989] State sequence treated as (first order) Markov chain No direct dependencies between observations p(x, y) = i p(x i y i )p(y i y i 1 ) Y 1 Y 2... Y n X 1 X 2... Maximum likelihood estimation: max θ X n N log p(x n, y n ) n=1 Decoding: Viterbi/dynamic programming (DP) algorithms Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

23 Generative Models Hidden Markov Models [Rabiner, 1989] State sequence treated as (first order) Markov chain No direct dependencies between observations p(x, y) = i p(x i y i )p(y i y i 1 ) Y 1 Y 2... Y n X 1 X 2... Maximum likelihood estimation: max θ X n N log p(x n, y n ) n=1 Decoding: Viterbi/dynamic programming (DP) algorithms Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

24 Discriminative Models Conditional Random Fields [Lafferty et al., 2001] conditional probability p(y x) instead of joint probability p(x, y) p w (y x) = 1 Z(x, w) exp(f w(y x)) Y 1 Y 2... Y n X = X 1, X 2,..., X n Can handle non-independent input features N Maximum likelihood estimation: max log p w (y n x n ) w Decoding: Viterbi or MEA algorithms [Gross et al., 2006] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23, n=1

25 Discriminative Models Conditional Random Fields [Lafferty et al., 2001] conditional probability p(y x) instead of joint probability p(x, y) p w (y x) = 1 Z(x, w) exp(f w(y x)) Y 1 Y 2... Y n X = X 1, X 2,..., X n Can handle non-independent input features N Maximum likelihood estimation: max log p w (y n x n ) w Decoding: Viterbi or MEA algorithms [Gross et al., 2006] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23, n=1

26 Discriminative Models Conditional Random Fields [Lafferty et al., 2001] conditional probability p(y x) instead of joint probability p(x, y) p w (y x) = 1 Z(x, w) exp(f w(y x)) Y 1 Y 2... Y n X = X 1, X 2,..., X n Can handle non-independent input features N Maximum likelihood estimation: max log p w (y n x n ) w Decoding: Viterbi or MEA algorithms [Gross et al., 2006] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23, n=1

27 Discriminative Models Conditional Random Fields [Lafferty et al., 2001] conditional probability p(y x) instead of joint probability p(x, y) p w (y x) = 1 Z(x, w) exp(f w(y x)) Y 1 Y 2... Y n X = X 1, X 2,..., X n Can handle non-independent input features N Maximum likelihood estimation: max log p w (y n x n ) w Decoding: Viterbi or MEA algorithms [Gross et al., 2006] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23, n=1

28 Max-Margin Structured Output Learning Learn function f (y x) scoring segmentations y for x Maximize f (y x) w.r.t. y for prediction: argmax f (y x) y Υ Idea: f (y x) f (ŷ x) for wrong labels ŷ y Max-margin parameter estimation: [Altun et al., 2003] Given N sequence pairs (x 1, y 1 ),..., (x N, y N ) for training Solve optimization problem: max f,ρ R,ξ R N + w.r.t. ρ C N n=1 ξ n f (y n x n ) f (y x n ) ρ ξ n for all y n y Υ, n = 1,..., N Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

29 Max-Margin Structured Output Learning Learn function f (y x) scoring segmentations y for x Maximize f (y x) w.r.t. y for prediction: argmax f (y x) y Υ Idea: f (y x) f (ŷ x) for wrong labels ŷ y Max-margin parameter estimation: [Altun et al., 2003] Given N sequence pairs (x 1, y 1 ),..., (x N, y N ) for training Solve optimization problem: max f,ρ R,ξ R N + w.r.t. ρ C N n=1 ξ n f (y n x n ) f (y x n ) ρ ξ n for all y n y Υ, n = 1,..., N Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

30 Max-Margin Structured Output Learning Learn function f (y x) scoring segmentations y for x Maximize f (y x) w.r.t. y for prediction: argmax f (y x) y Υ Idea: f (y x) f (ŷ x) for wrong labels ŷ y Max-margin parameter estimation: [Altun et al., 2003] Given N sequence pairs (x 1, y 1 ),..., (x N, y N ) for training Solve optimization problem: max f,ρ R,ξ R N + w.r.t. ρ C N n=1 ξ n f (y n x n ) f (y x n ) ρ ξ n for all y n y Υ, n = 1,..., N Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

31 Structured Output Learning with Φ(x, y) Assume f (y x) = w, Φ(x, y), where w, Φ(x, y) F, w 2 = 1 N ρ C max w =1,ρ R,ξ R N + n=1 ξ n w.r.t. w, Φ(x n, y n ) Φ(x n, y) ρ ξ n (1) for all y n y Υ, n = 1,..., N Linear classifier that separates true from wrong labellings N Representer Theorem: w = α n,y Φ(x n, y) n=1 y Υ Corollary: (1) only depend on products of Φ s Define kernel function : k((x, y), (x, y )) = Φ(x, y), Φ(x, y ) Kernel Trick (everything only depends on k) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

32 Structured Output Learning with Φ(x, y) Assume f (y x) = w, Φ(x, y), where w, Φ(x, y) F, w 2 = 1 N ρ C max w =1,ρ R,ξ R N + n=1 ξ n w.r.t. w, Φ(x n, y n ) Φ(x n, y) ρ ξ n (1) for all y n y Υ, n = 1,..., N Linear classifier that separates true from wrong labellings N Representer Theorem: w = α n,y Φ(x n, y) n=1 y Υ Corollary: (1) only depend on products of Φ s Define kernel function : k((x, y), (x, y )) = Φ(x, y), Φ(x, y ) Kernel Trick (everything only depends on k) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

33 Structured Output Learning with Φ(x, y) Assume f (y x) = w, Φ(x, y), where w, Φ(x, y) F, w 2 = 1 N ρ C max w =1,ρ R,ξ R N + n=1 ξ n w.r.t. w, Φ(x n, y n ) Φ(x n, y) ρ ξ n (1) for all y n y Υ, n = 1,..., N Linear classifier that separates true from wrong labellings N Representer Theorem: w = α n,y Φ(x n, y) n=1 y Υ Corollary: (1) only depend on products of Φ s Define kernel function : k((x, y), (x, y )) = Φ(x, y), Φ(x, y ) Kernel Trick (everything only depends on k) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

34 Structured Output Learning with Φ(x, y) Assume f (y x) = w, Φ(x, y), where w, Φ(x, y) F, w 2 = 1 N ρ C max w =1,ρ R,ξ R N + n=1 ξ n w.r.t. w, Φ(x n, y n ) Φ(x n, y) ρ ξ n (1) for all y n y Υ, n = 1,..., N Linear classifier that separates true from wrong labellings N Representer Theorem: w = α n,y Φ(x n, y) n=1 y Υ Corollary: (1) only depend on products of Φ s Define kernel function : k((x, y), (x, y )) = Φ(x, y), Φ(x, y ) Kernel Trick (everything only depends on k) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

35 Structured Output Learning with Φ(x, y) Assume f (y x) = w, Φ(x, y), where w, Φ(x, y) F, w 2 = 1 N ρ C max w =1,ρ R,ξ R N + n=1 ξ n w.r.t. w, Φ(x n, y n ) Φ(x n, y) ρ ξ n (1) for all y n y Υ, n = 1,..., N Linear classifier that separates true from wrong labellings N Representer Theorem: w = α n,y Φ(x n, y) n=1 y Υ Corollary: (1) only depend on products of Φ s Define kernel function : k((x, y), (x, y )) = Φ(x, y), Φ(x, y ) Kernel Trick (everything only depends on k) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

36 Optimization Strategy max w =1,ρ,ξ w.r.t. ρ C N n=1 ξ n w, Φ(x, y n ) Φ(x, y) ρ ξ n for all y n y Υ, n = 1,..., N Big: one constraint per example and wrong labeling Iterative solution Begin with small set of wrong labellings Solve reduced optimization problem Find labellings that violate constraints Add constraints, resolve Guaranteed Convergence: 2 : see Tsochantaridis et al. [2005] 1 : see Rätsch and Warmuth [2005] (log(n)/ɛ 2 iterations) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

37 How to find violated constraints? Constraint Find labeling y that maximizes w, Φ(x, y n ) Φ(x, y) ρ ξ n w, Φ(x, y) Use Dynamic Programming Decoding y = argmax w, Φ(x, y) y Υ (DP only works if Φ has certain decomposition structure) If y = y n, then compute second best labeling as well If constraint is violated, then add to optimization problem Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

38 Column-Generation Algorithm 1 Υ 1 n =, for n = 1,..., N 2 Solve (w t, ρ, ξ t ) = argmax w =1,ρ,ξ w.r.t. ρ C N n=1 ξ n w, Φ(x, y n ) Φ(x, y) ρ ξ n for all y n y Υ t n, n = 1,..., N 3 Find violated constraints (n = 1,..., N) y t n = argmax y n y Υ w t, Φ(x, y) If w t, Φ(x, y n ) Φ(x, y t n) < ρ ξ t n, set Υ t+1 n = Υ t n {y t n} 4 If violated constraint exists, then go to 2 5 Otherwise terminate Optimal solution Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

39 Problems Optimization may require many iterations Number of variables increases linearly When using kernels, solving optimization problems can become easily infeasible Evaluation of w, Φ(x, y) in dynamic programming can be very expensive Optimization and decoding become too expensive Approximation algorithms needed Idea: Decompose problem First part uses kernels, can be precomputed Second part without kernels and only combines ingredients Solve problems with 10, 000 examples, instead of just 500 Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

40 Problems Optimization may require many iterations Number of variables increases linearly When using kernels, solving optimization problems can become easily infeasible Evaluation of w, Φ(x, y) in dynamic programming can be very expensive Optimization and decoding become too expensive Approximation algorithms needed Idea: Decompose problem First part uses kernels, can be precomputed Second part without kernels and only combines ingredients Solve problems with 10, 000 examples, instead of just 500 Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

41 Problems Optimization may require many iterations Number of variables increases linearly When using kernels, solving optimization problems can become easily infeasible Evaluation of w, Φ(x, y) in dynamic programming can be very expensive Optimization and decoding become too expensive Approximation algorithms needed Idea: Decompose problem First part uses kernels, can be precomputed Second part without kernels and only combines ingredients Solve problems with 10, 000 examples, instead of just 500 Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

42 Problems Optimization may require many iterations Number of variables increases linearly When using kernels, solving optimization problems can become easily infeasible Evaluation of w, Φ(x, y) in dynamic programming can be very expensive Optimization and decoding become too expensive Approximation algorithms needed Idea: Decompose problem First part uses kernels, can be precomputed Second part without kernels and only combines ingredients Solve problems with 10, 000 examples, instead of just 500 Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

43 Back to Computational Gene Finding Given a piece of DNA sequence Predict gene products inluding the intermediate processing steps Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

44 Back to Computational Gene Finding DNA TSS polya/cleavage pre-mrna Splice Donor Splice Acceptor Splice Splice Donor Acceptor mrna TIS Stop cap polya Protein Predict signals used during processing Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

45 Back to Computational Gene Finding DNA TSS Donor Acceptor Donor Acceptor polya/cleavage pre-mrna TIS Stop mrna cap polya Protein Predict signals used during processing Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

46 Back to Computational Gene Finding DNA TSS Donor Acceptor Donor Acceptor polya/cleavage pre-mrna TIS Stop mrna cap polya Protein Predict signals used during processing Predict the correct corresponding label sequence with labels intergenic, exon, intron, 5 UTR, etc. Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

47 Example: Splice Site Recognition True Splice Sites Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

48 Example: Splice Site Recognition True Splice Sites CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA 150 nucleotides window around dimer Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

49 Example: Splice Site Recognition Potential Splice Sites CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA 150 nucleotides window around dimer Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

50 Example: Splice Site Recognition Potential Splice Sites CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA 150 nucleotides window around dimer. True sites: fixed window around a true splice site Decoy sites: all other consensus sites Millions of examples from EST databases Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

51 Example: Splice Site Recognition Potential Splice Sites CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA Basic idea: 150 nucleotides window around dimer For instance, exploit that exons have higher GC content or that specific motifs appear near splice sites. [Sonnenburg et al., 2007b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

52 Example: Splice Site Recognition Potential Splice Sites CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA Basic idea: 150 nucleotides window around dimer In practice: Use one feature per possible substring (e.g. 20) at all positions 150 ( ) features Needs efficient algorithms Leads to most accurate predictors that currently exist Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

53 Example: Splice Site Recognition Potential Splice Sites CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA Basic idea: 150 nucleotides window around dimer In practice: Use one feature per possible substring (e.g. 20) at all positions 150 ( ) features Needs efficient algorithms Leads to most accurate predictors that currently exist Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

54 Kernels and String Data Structures Good news: Use WD kernel to implicitly consider features: [Rätsch et al., 2007, Rätsch and Sonnenburg, 2004] Drawback: Training too expensive for millions of examples Needs more sophisticated algorithms: Exploit that features are sparse and can be explicitly represented String indexing data structures [Sonnenburg et al., 2007a] Use novel optimization techniques Use subgradient-based optimization [Franc and Sonnenburg, 2009] Implemented in a freely available software package [Sonnenburg et al., 2007a] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

55 Kernels and String Data Structures Good news: Use WD kernel to implicitly consider features: [Rätsch et al., 2007, Rätsch and Sonnenburg, 2004] Drawback: Training is too expensive for millions of examples Needs more sophisticated algorithms: Exploit that features are sparse and can be explicitly represented String indexing data structures [Sonnenburg et al., 2007a] Use novel optimization techniques Use subgradient-based optimization [Franc and Sonnenburg, 2009] Implemented in a freely available software package [Sonnenburg et al., 2007a] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

56 Kernels and String Data Structures Good news: Use WD kernel to implicitly consider features: [Rätsch et al., 2007, Rätsch and Sonnenburg, 2004] Drawback: Training was too expensive for millions of examples Needs more sophisticated algorithms: Exploit that features are sparse and can be explicitly represented String indexing data structures [Sonnenburg et al., 2007a] Use novel optimization techniques Use subgradient-based optimization [Franc and Sonnenburg, 2009] Implemented in a freely available software package [Sonnenburg et al., 2007a] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

57 Results on Splice Site Recognition Worm Fly Cress Fish Human Acc Don Acc Don Acc Don Acc Don Acc Don Markov Chain auprc(%) SVM auprc(%) [Sonnenburg, Schweikert, Philips, Behr, Rätsch, 2007] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

58 Human Splice Sites with WD Kernel Performance (=Area under PRC curve) keeps increasing. Pays off to use all available data! [Sonnenburg et al., 2007b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

59 Human Splice Sites with WD Kernel Performance (=Area under PRC curve) keeps increasing. Pays off to use all available data! [Sonnenburg et al., 2007b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

60 Example: Predictions in UCSC Browser [Rätsch et al., 2007, Schweikert et al., 2009c] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

61 Example: Predictions in UCSC Browser [Rätsch et al., 2007, Schweikert et al., 2009c] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

62 Example: Predictions in UCSC Browser Signals have to appear in the right order TSS TIS Stop cleave Don Acc Based on known genes, learn how to combine predictions for accurate gene prediction. Prediction of Structured Outputs [Rätsch et al., 2007, Schweikert et al., 2009c] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

63 Discriminative Gene Prediction (simplified) [Rätsch, Sonnenburg, Srinivasan, Witte, Müller, Sommer, Schölkopf, 2007] Simplified Model: Score for splice form y = {(p j, q j )} J j=1 : J 1 F (y) := S GT (fj GT ) + j=1 J S AG (f AG j=2 j ) } {{ } Splice signals S LI (p j+1 q j ) + J 1 + j=1 J S LE (q j p j ) j=1 } {{ } Segment lengths Tune free parameters by solving optimization problem. Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

64 Discriminative Gene Prediction (simplified) [Rätsch, Sonnenburg, Srinivasan, Witte, Müller, Sommer, Schölkopf, 2007] Simplified Model: Score for splice form y = {(p j, q j )} J j=1 : J 1 F (y) := S GT (fj GT ) + j=1 J S AG (f AG j=2 j ) } {{ } Splice signals S LI (p j+1 q j ) + J 1 + j=1 J S LE (q j p j ) j=1 } {{ } Segment lengths Tune free parameters by solving optimization problem. Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

65 Discriminative Gene Prediction (simplified) Take-Home Messages Simplified Model: Avoid Score solving for splice a huge form optimization y = {(p j, q j )} J j=1 : J 1 problem by splitting J 1 it into two parts F (y) := S GT (fj GT ) + j=1 [Rätsch, Sonnenburg, Srinivasan, Witte, Müller, Sommer, Schölkopf, 2007] j ) } {{ } Splice signals J S AG (f AG + S LI (p j+1 q j ) + Exploit state-of-the-art signal j=2 j=1 predictions J S LE (q j p j ) j=1 } {{ } Segment lengths Predict more accurately than Tune free parameters generative by solving models optimization problem. Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

66 Results using mgene Most accurate ab initio method in the ngasp genome annotation challenge for C. elegans [Coghlan et al., 2008] Validation of gene predictions for C. elegans: [Schweikert et al., 2009c] No. of genes No. of genes Frac. of genes analyzed w/ expression New genes 2, % Missing unconf. genes % Annotation of other nematode genomes: [Schweikert et al., 2009c] Genome Genome No. of No. exons/gene mgene best other size [Mbp] genes (mean) accuracy accuracy C. remanei % 93.8% C. japonica % 88.7% C. brenneri % 87.8% C. briggsae % 82.0% Web service for ab initio gene predictions [Schweikert et al., 2009b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

67 Results using mgene Most accurate ab initio method in the ngasp genome annotation challenge for C. elegans [Coghlan et al., 2008] Validation of gene predictions for C. elegans: [Schweikert et al., 2009c] No. of genes No. of genes Frac. of genes analyzed w/ expression New genes 2, % Missing unconf. genes % Annotation of other nematode genomes: [Schweikert et al., 2009c] Genome Genome No. of No. exons/gene mgene best other size [Mbp] genes (mean) accuracy accuracy C. remanei % 93.8% C. japonica % 88.7% C. brenneri % 87.8% C. briggsae % 82.0% Web service for ab initio gene predictions [Schweikert et al., 2009b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

68 Results using mgene Most accurate ab initio method in the ngasp genome annotation challenge for C. elegans [Coghlan et al., 2008] Validation of gene predictions for C. elegans: [Schweikert et al., 2009c] No. of genes No. of genes Frac. of genes analyzed w/ expression New genes 2, % Missing unconf. genes % Annotation of other nematode genomes: [Schweikert et al., 2009c] Genome Genome No. of No. exons/gene mgene best other size [Mbp] genes (mean) accuracy accuracy C. remanei % 93.8% C. japonica % 88.7% C. brenneri % 87.8% C. briggsae % 82.0% Web service for ab initio gene predictions [Schweikert et al., 2009b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

69 Results using mgene Most accurate ab initio method in the ngasp genome annotation challenge for C. elegans [Coghlan et al., 2008] Validation of gene predictions for C. elegans: [Schweikert et al., 2009c] No. of genes No. of genes Frac. of genes analyzed w/ expression New genes 2, % Missing unconf. genes % Annotation of other nematode genomes: [Schweikert et al., 2009c] Genome Genome No. of No. exons/gene mgene best other size [Mbp] genes (mean) accuracy accuracy C. remanei % 93.8% C. japonica % 88.7% C. brenneri % 87.8% C. briggsae % 82.0% Web service for ab initio gene predictions [Schweikert et al., 2009b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

70 mgene.web: Gene Finding for Everybody ;-) mgene.org/web [Schweikert et al., 2009b] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

71 Limitations/Extensions Gene finding (by far) still not perfect New ab initio techniques more accurate Many genes mis-predicted (C. elegans: 50%; human: 80%) What is missing? Not enough examples for training? Model complexity? Other information? Transcriptome is the result of many complex processes Current methods also ignore other important information: Chromatin structure and methylation patterns RNA structure and processing regulation... Ab initio methods cannot predict alternative transcripts Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

72 Limitations/Extensions Gene finding (by far) still not perfect New ab initio techniques more accurate Many genes mis-predicted (C. elegans: 50%; human: 80%) What is missing? Not enough examples for training? Model complexity? Other information? Transcriptome is the result of many complex processes Current methods also ignore other important information: Chromatin structure and methylation patterns RNA structure and processing regulation... Ab initio methods cannot predict alternative transcripts Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

73 Limitations/Extensions Gene finding (by far) still not perfect New ab initio techniques more accurate Many genes mis-predicted (C. elegans: 50%; human: 80%) What is missing? Not enough examples for training? Model complexity? Other information? Transcriptome is the result of many complex processes Current methods also ignore other important information: Chromatin structure and methylation patterns RNA structure and processing regulation... Ab initio methods cannot predict alternative transcripts Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

74 Limitations/Extensions Gene finding (by far) still not perfect New ab initio techniques more accurate Many genes mis-predicted (C. elegans: 50%; human: 80%) What is missing? Not enough examples for training? Model complexity? Other information? Transcriptome is the result of many complex processes Current methods also ignore other important information: Chromatin structure and methylation patterns RNA structure and processing regulation... Ab initio methods cannot predict alternative transcripts Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

75 Gene Finding + Auxiliary Measurements Is there other information that may be used by cellular processes that can improve prediction results? Preliminary study: (A. thaliana) transcript level (SN + SP)/2 1 mgene (ab initio) % 2... with DNA methylation (1 tissue) 76.1% 3... with Nucleosome position predictions 78.0% 4... with RNA secondary structure predictions 76.7% In progress: Study effect of other information sources for human gene prediction Ideally, consider measurements from the same sample [Behr et al., 2009, in prep.] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

76 Gene Finding + Auxiliary Measurements Is there other information that may be used by cellular processes that can improve prediction results? Preliminary study: (A. thaliana) transcript level (SN + SP)/2 1 mgene (ab initio) % 2... with DNA methylation (1 tissue) 76.1% 3... with Nucleosome position predictions 78.0% 4... with RNA secondary structure predictions 76.7% In progress: Study effect of other information sources for human gene prediction Ideally, consider measurements from the same sample [Behr et al., 2009, in prep.] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

77 Gene Finding + Auxiliary Measurements Is there other information that may be used by cellular processes that can improve prediction results? Preliminary study: (A. thaliana) transcript level (SN + SP)/2 1 mgene (ab initio) % 2... with DNA methylation (1 tissue) 76.1% 3... with Nucleosome position predictions 78.0% 4... with RNA secondary structure predictions 76.7% In progress: Study effect of other information sources for human gene prediction Ideally, consider measurements from the same sample [Behr et al., 2009, in prep.] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

78 Gene Finding + Auxiliary Measurements Is there other information that may be used by cellular processes that can improve prediction results? Preliminary study: (A. thaliana) transcript level (SN + SP)/2 1 mgene (ab initio) % 2... with DNA methylation (1 tissue) 76.1% 3... with Nucleosome position predictions 78.0% 4... with RNA secondary structure predictions 76.7% In progress: Study effect of other information sources for human gene prediction Ideally, consider measurements from the same sample [Behr et al., 2009, in prep.] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

79 Gene Finding + Auxiliary Measurements Is there other information that may be used by cellular processes that can improve prediction results? Preliminary study: (A. thaliana) transcript level (SN + SP)/2 1 mgene (ab initio) % 2... with DNA methylation (1 tissue) 76.1% 3... with Nucleosome position predictions 78.0% 4... with RNA secondary structure predictions 76.7% In progress: Study effect of other information sources for human gene prediction Ideally, consider measurements from the same sample [Behr et al., 2009, in prep.] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

80 Alternative Transcripts: First Steps Predictions of alternative splicing Predict novel alternative splicing as independent events Only use sequence information [Rätsch et al., 2005] Requires known gene structures: Use ab initio predictions Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

81 Alternative Transcripts: More Steps Combine gene finding with prediction of alternative splicing Machine learning challenge: [Zien et al., 2006] Input: DNA sequence Output: Splicing graph mgene approach can be extended to Predict simple splicing graphs But, without clean data we cannot train the model... Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

82 Alternative Transcripts: More Steps Combine gene finding with prediction of alternative splicing Machine learning challenge: [Zien et al., 2006] Input: DNA sequence Output: Splicing graph mgene approach can be extended to Predict simple splicing graphs TSS TIS Stop cleave Don Acc Exon skip A-Don A-Acc But, without clean data we cannot train the model... Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

83 Alternative Transcripts: More Steps Combine gene finding with prediction of alternative splicing Machine learning challenge: [Zien et al., 2006] Input: DNA sequence Output: Splicing graph mgene approach can be extended to Predict simple splicing graphs TSS TIS Stop cleave Don Acc Exon skip A-Don A-Acc But, without clean data we cannot train the model... Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

84 Reconstructing the Transcriptome DNA mrna cap polya Protein Directly measure the transcriptome Whole genome tiling arrays Transcriptome sequencing (Sanger or Next Generation Sequencing) Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

85 mrna Deep Sequencing [Wikipedia] Gunnar Rätsch (FML, Tübingen) Large Scale Sequence Analysis October, 23,

Large Scale Sequence Analysis with Applications to Genomics

Large Scale Sequence Analysis with Applications to Genomics Large Scale Sequence Analysis with Applications to Genomics Gunnar Rätsch, Max Planck Society Tübingen, Germany Talk at CSML, University College London March 18, 2009 Gunnar Rätsch (FML, Tübingen) Large

Lisätiedot

Capacity Utilization

Capacity Utilization Capacity Utilization Tim Schöneberg 28th November Agenda Introduction Fixed and variable input ressources Technical capacity utilization Price based capacity utilization measure Long run and short run

Lisätiedot

The CCR Model and Production Correspondence

The CCR Model and Production Correspondence The CCR Model and Production Correspondence Tim Schöneberg The 19th of September Agenda Introduction Definitions Production Possiblity Set CCR Model and the Dual Problem Input excesses and output shortfalls

Lisätiedot

Efficiency change over time

Efficiency change over time Efficiency change over time Heikki Tikanmäki Optimointiopin seminaari 14.11.2007 Contents Introduction (11.1) Window analysis (11.2) Example, application, analysis Malmquist index (11.3) Dealing with panel

Lisätiedot

Alternative DEA Models

Alternative DEA Models Mat-2.4142 Alternative DEA Models 19.9.2007 Table of Contents Banker-Charnes-Cooper Model Additive Model Example Data Home assignment BCC Model (Banker-Charnes-Cooper) production frontiers spanned by convex

Lisätiedot

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu Returns to Scale II Contents Most Productive Scale Size Further Considerations Relaxation of the Convexity Condition Useful Reminder Theorem 5.5 A DMU found to be efficient with a CCR model will also be

Lisätiedot

Other approaches to restrict multipliers

Other approaches to restrict multipliers Other approaches to restrict multipliers Heikki Tikanmäki Optimointiopin seminaari 10.10.2007 Contents Short revision (6.2) Another Assurance Region Model (6.3) Cone-Ratio Method (6.4) An Application of

Lisätiedot

Bounds on non-surjective cellular automata

Bounds on non-surjective cellular automata Bounds on non-surjective cellular automata Jarkko Kari Pascal Vanier Thomas Zeume University of Turku LIF Marseille Universität Hannover 27 august 2009 J. Kari, P. Vanier, T. Zeume (UTU) Bounds on non-surjective

Lisätiedot

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) Juha Kahkonen Click here if your download doesn"t start automatically On instrument costs

Lisätiedot

Capacity utilization

Capacity utilization Mat-2.4142 Seminar on optimization Capacity utilization 12.12.2007 Contents Summary of chapter 14 Related DEA-solver models Illustrative examples Measure of technical capacity utilization Price-based measure

Lisätiedot

Information on preparing Presentation

Information on preparing Presentation Information on preparing Presentation Seminar on big data management Lecturer: Spring 2017 20.1.2017 1 Agenda Hints and tips on giving a good presentation Watch two videos and discussion 22.1.2017 2 Goals

Lisätiedot

16. Allocation Models

16. Allocation Models 16. Allocation Models Juha Saloheimo 17.1.27 S steemianalsin Optimointiopin seminaari - Sks 27 Content Introduction Overall Efficienc with common prices and costs Cost Efficienc S steemianalsin Revenue

Lisätiedot

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) Juha Kahkonen Click here if your download doesn"t start automatically On instrument costs

Lisätiedot

Gap-filling methods for CH 4 data

Gap-filling methods for CH 4 data Gap-filling methods for CH 4 data Sigrid Dengel University of Helsinki Outline - Ecosystems known for CH 4 emissions; - Why is gap-filling of CH 4 data not as easy and straight forward as CO 2 ; - Gap-filling

Lisätiedot

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31) Juha Kahkonen Click here if your download doesn"t start automatically On instrument costs

Lisätiedot

Functional Genomics & Proteomics

Functional Genomics & Proteomics Functional Genomics & Proteomics Genome Sequences TCACAATTTAGACATCTAGTCTTCCACTTAAGCATATTTAGATTGTTTCCAGTTTTCAGCTTTTATGACTAAATCTTCTAAAATTGTTTTTCCCTAAATGTATATTTTAATTTGTCTCAGGAGTAGAATTTCTGAGTCATAAAGCGGT CATATGTATAAATTTTAGGTGCCTCATAGCTCTTCAAATAGTCATCCCATTTTATACATCCAGGCAATATATGAGAGTTCTTGGTGCTCCACATCTTAGCTAGGATTTGATGTCAACCAGTCTCTTTAATTTAGATATTCTAGTACAT

Lisätiedot

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. David R. Kelley

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. David R. Kelley Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks David R. Kelley DNA codes for complex life. How? Kundaje et al. Integrative analysis of 111 reference

Lisätiedot

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11) Course organization Introduction ( Week 1) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 2)» Algorithm complexity analysis

Lisätiedot

Genome 373: Genomic Informatics. Professors Elhanan Borenstein and Jay Shendure

Genome 373: Genomic Informatics. Professors Elhanan Borenstein and Jay Shendure Genome 373: Genomic Informatics Professors Elhanan Borenstein and Jay Shendure Genome 373 This course is intended to introduce students to the breadth of problems and methods in computational analysis

Lisätiedot

Alternatives to the DFT

Alternatives to the DFT Alternatives to the DFT Doru Balcan Carnegie Mellon University joint work with Aliaksei Sandryhaila, Jonathan Gross, and Markus Püschel - appeared in IEEE ICASSP 08 - Introduction Discrete time signal

Lisätiedot

Statistical design. Tuomas Selander

Statistical design. Tuomas Selander Statistical design Tuomas Selander 28.8.2014 Introduction Biostatistician Work area KYS-erva KYS, Jyväskylä, Joensuu, Mikkeli, Savonlinna Work tasks Statistical methods, selection and quiding Data analysis

Lisätiedot

Mat Seminar on Optimization. Data Envelopment Analysis. Economies of Scope S ysteemianalyysin. Laboratorio. Teknillinen korkeakoulu

Mat Seminar on Optimization. Data Envelopment Analysis. Economies of Scope S ysteemianalyysin. Laboratorio. Teknillinen korkeakoulu Mat-2.4142 Seminar on Optimization Data Envelopment Analysis Economies of Scope 21.11.2007 Economies of Scope Introduced 1982 by Panzar and Willing Support decisions like: Should a firm... Produce a variety

Lisätiedot

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL FinFamily PostgreSQL 1 Sisällys / Contents FinFamily PostgreSQL... 1 1. Asenna PostgreSQL tietokanta / Install PostgreSQL database... 3 1.1. PostgreSQL tietokannasta / About the PostgreSQL database...

Lisätiedot

Bioinformatics in Laboratory of Computer and Information Science

Bioinformatics in Laboratory of Computer and Information Science HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE Bioinformatics in Laboratory of Computer and Information Science Samuel Kaski Research Two centers of excellence of the

Lisätiedot

State of the Union... Functional Genomics Research Stream. Molecular Biology. Genomics. Computational Biology

State of the Union... Functional Genomics Research Stream. Molecular Biology. Genomics. Computational Biology Functional Genomics Research Stream State of the Union... Research Meeting: February 16, 2010 Functional Genomics & Research Report III Concepts Genomics Molecular Biology Computational Biology Genome

Lisätiedot

Operatioanalyysi 2011, Harjoitus 4, viikko 40

Operatioanalyysi 2011, Harjoitus 4, viikko 40 Operatioanalyysi 2011, Harjoitus 4, viikko 40 H4t1, Exercise 4.2. H4t2, Exercise 4.3. H4t3, Exercise 4.4. H4t4, Exercise 4.5. H4t5, Exercise 4.6. (Exercise 4.2.) 1 4.2. Solve the LP max z = x 1 + 2x 2

Lisätiedot

7.4 Variability management

7.4 Variability management 7.4 Variability management time... space software product-line should support variability in space (different products) support variability in time (maintenance, evolution) 1 Product variation Product

Lisätiedot

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition) Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition) Esko Jalkanen Click here if your download doesn"t start automatically Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition) Esko Jalkanen

Lisätiedot

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo TIEKE Verkottaja Service Tools for electronic data interchange utilizers Heikki Laaksamo TIEKE Finnish Information Society Development Centre (TIEKE Tietoyhteiskunnan kehittämiskeskus ry) TIEKE is a neutral,

Lisätiedot

Plasmid Name: pmm290. Aliases: none known. Length: bp. Constructed by: Mike Moser/Cristina Swanson. Last updated: 17 August 2009

Plasmid Name: pmm290. Aliases: none known. Length: bp. Constructed by: Mike Moser/Cristina Swanson. Last updated: 17 August 2009 Plasmid Name: pmm290 Aliases: none known Length: 11707 bp Constructed by: Mike Moser/Cristina Swanson Last updated: 17 August 2009 Description and application: This is a mammalian expression vector for

Lisätiedot

812336A C++ -kielen perusteet, 21.8.2010

812336A C++ -kielen perusteet, 21.8.2010 812336A C++ -kielen perusteet, 21.8.2010 1. Vastaa lyhyesti seuraaviin kysymyksiin (1p kaikista): a) Mitä tarkoittaa funktion ylikuormittaminen (overloading)? b) Mitä tarkoittaa jäsenfunktion ylimääritys

Lisätiedot

Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1.

Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1. Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1.2017 Life Science Technologies Where Life Sciences meet with Technology

Lisätiedot

Characterization of clay using x-ray and neutron scattering at the University of Helsinki and ILL

Characterization of clay using x-ray and neutron scattering at the University of Helsinki and ILL Characterization of clay using x-ray and neutron scattering at the University of Helsinki and ILL Ville Liljeström, Micha Matusewicz, Kari Pirkkalainen, Jussi-Petteri Suuronen and Ritva Serimaa 13.3.2012

Lisätiedot

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward.

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward. START START SIT 1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward. This is a static exercise. SIT STAND 2. SIT STAND. The

Lisätiedot

Returns to Scale Chapters

Returns to Scale Chapters Return to Scale Chapter 5.1-5.4 Saara Tuurala 26.9.2007 Index Introduction Baic Formulation of Retur to Scale Geometric Portrayal in DEA BCC Return to Scale CCR Return to Scale Summary Home Aignment Introduction

Lisätiedot

Valuation of Asian Quanto- Basket Options

Valuation of Asian Quanto- Basket Options Valuation of Asian Quanto- Basket Options (Final Presentation) 21.11.2011 Thesis Instructor and Supervisor: Prof. Ahti Salo Työn saa tallentaa ja julkistaa Aalto-yliopiston avoimilla verkkosivuilla. Muilta

Lisätiedot

7. Product-line architectures

7. Product-line architectures 7. Product-line architectures 7.1 Introduction 7.2 Product-line basics 7.3 Layered style for product-lines 7.4 Variability management 7.5 Benefits and problems with product-lines 1 Short history of software

Lisätiedot

make and make and make ThinkMath 2017

make and make and make ThinkMath 2017 Adding quantities Lukumäärienup yhdistäminen. Laske yhteensä?. Countkuinka howmonta manypalloja ballson there are altogether. and ja make and make and ja make on and ja make ThinkMath 7 on ja on on Vaihdannaisuus

Lisätiedot

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana Taustaa KAO mukana FINECVET-hankeessa, jossa pilotoimme ECVETiä

Lisätiedot

Windows Phone. Module Descriptions. Opiframe Oy puh. +358 44 7220800 eero.huusko@opiframe.com. 02600 Espoo

Windows Phone. Module Descriptions. Opiframe Oy puh. +358 44 7220800 eero.huusko@opiframe.com. 02600 Espoo Windows Phone Module Descriptions Mikä on RekryKoulutus? Harvassa ovat ne työnantajat, jotka löytävät juuri heidän alansa hallitsevat ammatti-ihmiset valmiina. Fiksuinta on tunnustaa tosiasiat ja hankkia

Lisätiedot

Results on the new polydrug use questions in the Finnish TDI data

Results on the new polydrug use questions in the Finnish TDI data Results on the new polydrug use questions in the Finnish TDI data Multi-drug use, polydrug use and problematic polydrug use Martta Forsell, Finnish Focal Point 28/09/2015 Martta Forsell 1 28/09/2015 Esityksen

Lisätiedot

Strict singularity of a Volterra-type integral operator on H p

Strict singularity of a Volterra-type integral operator on H p Strict singularity of a Volterra-type integral operator on H p Santeri Miihkinen, University of Helsinki IWOTA St. Louis, 18-22 July 2016 Santeri Miihkinen, University of Helsinki Volterra-type integral

Lisätiedot

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4)

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4) Tilasto T1106120-s2012palaute Kyselyn T1106120+T1106120-s2012palaute yhteenveto: vastauksia (4) Kysymys 1 Degree programme: (4) TIK: TIK 1 25% ************** INF: INF 0 0% EST: EST 0 0% TLT: TLT 0 0% BIO:

Lisätiedot

Tietorakenteet ja algoritmit

Tietorakenteet ja algoritmit Tietorakenteet ja algoritmit Taulukon edut Taulukon haitat Taulukon haittojen välttäminen Dynaamisesti linkattu lista Linkatun listan solmun määrittelytavat Lineaarisen listan toteutus dynaamisesti linkattuna

Lisätiedot

Choose Finland-Helsinki Valitse Finland-Helsinki

Choose Finland-Helsinki Valitse Finland-Helsinki Write down the Temporary Application ID. If you do not manage to complete the form you can continue where you stopped with this ID no. Muista Temporary Application ID. Jos et onnistu täyttää lomake loppuun

Lisätiedot

Categorical Decision Making Units and Comparison of Efficiency between Different Systems

Categorical Decision Making Units and Comparison of Efficiency between Different Systems Categorical Decision Making Units and Comparison of Efficiency between Different Systems Mat-2.4142 Optimointiopin Seminaari Source William W. Cooper, Lawrence M. Seiford, Kaoru Tone: Data Envelopment

Lisätiedot

tgg agg Supplementary Figure S1.

tgg agg Supplementary Figure S1. ttaggatattcggtgaggtgatatgtctctgtttggaaatgtctccgccattaactcaag tggaaagtgtatagtaatgaatctttcaagcacacagatcacttcaaaagactgtttcaa catcacctcaggacaaaaagatgtactctcatttggatgctgtgatgccatgggtcacag attgcaattcccaagtgcccgttcttttacaccaaaatcaaagaagaatatctccccttt

Lisätiedot

T Statistical Natural Language Processing Answers 6 Collocations Version 1.0

T Statistical Natural Language Processing Answers 6 Collocations Version 1.0 T-61.5020 Statistical Natural Language Processing Answers 6 Collocations Version 1.0 1. Let s start by calculating the results for pair valkoinen, talo manually: Frequency: Bigrams valkoinen, talo occurred

Lisätiedot

anna minun kertoa let me tell you

anna minun kertoa let me tell you anna minun kertoa let me tell you anna minun kertoa I OSA 1. Anna minun kertoa sinulle mitä oli. Tiedän että osaan. Kykenen siihen. Teen nyt niin. Minulla on oikeus. Sanani voivat olla puutteellisia mutta

Lisätiedot

FETAL FIBROBLASTS, PASSAGE 10

FETAL FIBROBLASTS, PASSAGE 10 Double-stranded methylation patterns of a 104-bp L1 promoter in DNAs from fetal fibroblast passages 10, 14, 17, and 22 using barcoded hairpinbisulfite PCR. Fifteen L1 sequences were analyzed for passages

Lisätiedot

Teknillinen tiedekunta, matematiikan jaos Numeeriset menetelmät

Teknillinen tiedekunta, matematiikan jaos Numeeriset menetelmät Numeeriset menetelmät 1. välikoe, 14.2.2009 1. Määrää matriisin 1 1 a 1 3 a a 4 a a 2 1 LU-hajotelma kaikille a R. Ratkaise LU-hajotelmaa käyttäen yhtälöryhmä Ax = b, missä b = [ 1 3 2a 2 a + 3] T. 2.

Lisätiedot

Network to Get Work. Tehtäviä opiskelijoille Assignments for students. www.laurea.fi

Network to Get Work. Tehtäviä opiskelijoille Assignments for students. www.laurea.fi Network to Get Work Tehtäviä opiskelijoille Assignments for students www.laurea.fi Ohje henkilöstölle Instructions for Staff Seuraavassa on esitetty joukko tehtäviä, joista voit valita opiskelijaryhmällesi

Lisätiedot

Hankkeen toiminnot työsuunnitelman laatiminen

Hankkeen toiminnot työsuunnitelman laatiminen Hankkeen toiminnot työsuunnitelman laatiminen Hanketyöpaja LLP-ohjelman keskitettyjä hankkeita (Leonardo & Poikittaisohjelma) valmisteleville11.11.2011 Työsuunnitelma Vastaa kysymykseen mitä projektissa

Lisätiedot

Kvanttilaskenta - 1. tehtävät

Kvanttilaskenta - 1. tehtävät Kvanttilaskenta -. tehtävät Johannes Verwijnen January 9, 0 edx-tehtävät Vastauksissa on käytetty edx-kurssin materiaalia.. Problem False, sillä 0 0. Problem False, sillä 0 0 0 0. Problem A quantum state

Lisätiedot

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) (www.childrens-books-bilingual.com) (Finnish Edition)

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) (www.childrens-books-bilingual.com) (Finnish Edition) Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) (www.childrens-books-bilingual.com) (Finnish Edition) Click here if your download doesn"t start automatically

Lisätiedot

MALE ADULT FIBROBLAST LINE (82-6hTERT)

MALE ADULT FIBROBLAST LINE (82-6hTERT) Double-stranded methylation patterns of a 104-bp L1 promoter in DNAs from male and female fibroblasts, male leukocytes and female lymphoblastoid cells using hairpin-bisulfite PCR. Fifteen L1 sequences

Lisätiedot

11. Models With Restricted Multipliers Assurance Region Method

11. Models With Restricted Multipliers Assurance Region Method . Models With Restricted Mltipliers Assrance Region Method Kimmo Krki 3..27 Esitelmä - Kimmo Krki Contents Introdction to Models With Restricted Mltipliers (Ch 6.) Assrance region method (Ch 6.2) Formlation

Lisätiedot

How to handle uncertainty in future projections?

How to handle uncertainty in future projections? How to handle uncertainty in future projections? Samu Mäntyniemi, Fisheries and Environmental Management group (FEM), University of Helsinki http://www.helsinki.fi/science/fem/ Biotieteellinen tiedekunta

Lisätiedot

LYTH-CONS CONSISTENCY TRANSMITTER

LYTH-CONS CONSISTENCY TRANSMITTER LYTH-CONS CONSISTENCY TRANSMITTER LYTH-INSTRUMENT OY has generate new consistency transmitter with blade-system to meet high technical requirements in Pulp&Paper industries. Insurmountable advantages are

Lisätiedot

Telecommunication Software

Telecommunication Software Telecommunication Software Final exam 21.11.2006 COMPUTER ENGINEERING LABORATORY 521265A Vastaukset englanniksi tai suomeksi. / Answers in English or in Finnish. 1. (a) Määrittele sovellusviesti, PersonnelRecord,

Lisätiedot

Tilausvahvistus. Anttolan Urheilijat HENNA-RIIKKA HAIKONEN KUMMANNIEMENTIE 5 B RAHULA. Anttolan Urheilijat

Tilausvahvistus. Anttolan Urheilijat HENNA-RIIKKA HAIKONEN KUMMANNIEMENTIE 5 B RAHULA. Anttolan Urheilijat 7.80.4 Asiakasnumero: 3000359 KALLE MANNINEN KOVASTENLUODONTIE 46 51600 HAUKIVUORI Toimitusosoite: KUMMANNIEMENTIE 5 B 51720 RAHULA Viitteenne: Henna-Riikka Haikonen Viitteemme: Pyry Niemi +358400874498

Lisätiedot

C++11 seminaari, kevät Johannes Koskinen

C++11 seminaari, kevät Johannes Koskinen C++11 seminaari, kevät 2012 Johannes Koskinen Sisältö Mikä onkaan ongelma? Standardidraftin luku 29: Atomiset tyypit Muistimalli Rinnakkaisuus On multicore systems, when a thread writes a value to memory,

Lisätiedot

toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous

toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous Tuula Sutela toukokuu 2011: Lukion kokeiden kehittämistyöryhmien suunnittelukokous äidinkieli ja kirjallisuus, modersmål och litteratur, kemia, maantiede, matematiikka, englanti käsikirjoitukset vuoden

Lisätiedot

Oma sininen meresi (Finnish Edition)

Oma sininen meresi (Finnish Edition) Oma sininen meresi (Finnish Edition) Hannu Pirilä Click here if your download doesn"t start automatically Oma sininen meresi (Finnish Edition) Hannu Pirilä Oma sininen meresi (Finnish Edition) Hannu Pirilä

Lisätiedot

MEETING PEOPLE COMMUNICATIVE QUESTIONS

MEETING PEOPLE COMMUNICATIVE QUESTIONS Tiistilän koulu English Grades 7-9 Heikki Raevaara MEETING PEOPLE COMMUNICATIVE QUESTIONS Meeting People Hello! Hi! Good morning! Good afternoon! How do you do? Nice to meet you. / Pleased to meet you.

Lisätiedot

BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET.

BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET. BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET. Pekka Ollikainen Open Source Microsoft CodePlex bio Verkkosivustovastaava Suomen Sarjakuvaseura

Lisätiedot

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland Anne Mari Juppo, Nina Katajavuori University of Helsinki Faculty of Pharmacy 23.7.2012 1 Background Pedagogic research

Lisätiedot

Huom. tämä kulma on yhtä suuri kuin ohjauskulman muutos. lasketaan ajoneuvon keskipisteen ympyräkaaren jänteen pituus

Huom. tämä kulma on yhtä suuri kuin ohjauskulman muutos. lasketaan ajoneuvon keskipisteen ympyräkaaren jänteen pituus AS-84.327 Paikannus- ja navigointimenetelmät Ratkaisut 2.. a) Kun kuvan ajoneuvon kumpaakin pyörää pyöritetään tasaisella nopeudella, ajoneuvon rata on ympyränkaaren segmentin muotoinen. Hitaammin kulkeva

Lisätiedot

CS284A Representations & Algorithms for Molecular Biology. Xiaohui S. Xie University of California, Irvine

CS284A Representations & Algorithms for Molecular Biology. Xiaohui S. Xie University of California, Irvine CS284A Representations & Algorithms for Molecular Biology Xiaohui S. Xie University of California, Irvine Today s Goals Course information Challenges in computational biology Introduction to molecular

Lisätiedot

Use of spatial data in the new production environment and in a data warehouse

Use of spatial data in the new production environment and in a data warehouse Use of spatial data in the new production environment and in a data warehouse Nordic Forum for Geostatistics 2007 Session 3, GI infrastructure and use of spatial database Statistics Finland, Population

Lisätiedot

Viral DNA as a model for coil to globule transition

Viral DNA as a model for coil to globule transition Viral DNA as a model for coil to globule transition Marina Rossi Lab. of complex fluids and molecular biophysics LITA (Segrate) UNIVERSITA DEGLI STUDI DI MILANO - PhD Workshop October 14 th, 2013 Temperature

Lisätiedot

1.3 Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

1.3 Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä OULUN YLIOPISTO Tietojenkäsittelytieteiden laitos Johdatus ohjelmointiin 811122P (5 op.) 12.12.2005 Ohjelmointikieli on Java. Tentissä saa olla materiaali mukana. Tenttitulokset julkaistaan aikaisintaan

Lisätiedot

Integration of Finnish web services in WebLicht Presentation in Freudenstadt 2010-10-16 by Jussi Piitulainen

Integration of Finnish web services in WebLicht Presentation in Freudenstadt 2010-10-16 by Jussi Piitulainen Integration of Finnish web services in WebLicht Presentation in Freudenstadt 2010-10-16 by Jussi Piitulainen Who we are FIN-CLARIN University of Helsinki The Language Bank of Finland CSC - The Center for

Lisätiedot

Operatioanalyysi 2011, Harjoitus 2, viikko 38

Operatioanalyysi 2011, Harjoitus 2, viikko 38 Operatioanalyysi 2011, Harjoitus 2, viikko 38 H2t1, Exercise 1.1. H2t2, Exercise 1.2. H2t3, Exercise 2.3. H2t4, Exercise 2.4. H2t5, Exercise 2.5. (Exercise 1.1.) 1 1.1. Model the following problem mathematically:

Lisätiedot

Infrastruktuurin asemoituminen kansalliseen ja kansainväliseen kenttään Outi Ala-Honkola Tiedeasiantuntija

Infrastruktuurin asemoituminen kansalliseen ja kansainväliseen kenttään Outi Ala-Honkola Tiedeasiantuntija Infrastruktuurin asemoituminen kansalliseen ja kansainväliseen kenttään Outi Ala-Honkola Tiedeasiantuntija 1 Asemoitumisen kuvaus Hakemukset parantuneet viime vuodesta, mutta paneeli toivoi edelleen asemoitumisen

Lisätiedot

Enterprise Architecture TJTSE Yrityksen kokonaisarkkitehtuuri

Enterprise Architecture TJTSE Yrityksen kokonaisarkkitehtuuri Enterprise Architecture TJTSE25 2009 Yrityksen kokonaisarkkitehtuuri Jukka (Jups) Heikkilä Professor, IS (ebusiness) Faculty of Information Technology University of Jyväskylä e-mail: jups@cc.jyu.fi tel:

Lisätiedot

Toppila/Kivistö 10.01.2013 Vastaa kaikkin neljään tehtävään, jotka kukin arvostellaan asteikolla 0-6 pistettä.

Toppila/Kivistö 10.01.2013 Vastaa kaikkin neljään tehtävään, jotka kukin arvostellaan asteikolla 0-6 pistettä. ..23 Vastaa kaikkin neljään tehtävään, jotka kukin arvostellaan asteikolla -6 pistettä. Tehtävä Ovatko seuraavat väittämät oikein vai väärin? Perustele vastauksesi. (a) Lineaarisen kokonaislukutehtävän

Lisätiedot

Chapter 9 Motif finding. Chaochun Wei Spring 2019

Chapter 9 Motif finding. Chaochun Wei Spring 2019 1896 1920 1987 2006 Chapter 9 Motif finding Chaochun Wei Spring 2019 Contents 1. Reading materials 2. Sequence structure modeling Motif finding Regulatory module finding 2 Reading materials Tompa et al

Lisätiedot

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Sequence Analysis: Part III. Pattern Searching and Gene Finding Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18

Lisätiedot

OP1. PreDP StudyPlan

OP1. PreDP StudyPlan OP1 PreDP StudyPlan PreDP The preparatory year classes are in accordance with the Finnish national curriculum, with the distinction that most of the compulsory courses are taught in English to familiarize

Lisätiedot

Rekisteröiminen - FAQ

Rekisteröiminen - FAQ Rekisteröiminen - FAQ Miten Akun/laturin rekisteröiminen tehdään Akun/laturin rekisteröiminen tapahtuu samalla tavalla kuin nykyinen takuurekisteröityminen koneille. Nykyistä tietokantaa on muokattu niin,

Lisätiedot

Kvanttilaskenta - 2. tehtävät

Kvanttilaskenta - 2. tehtävät Kvanttilaskenta -. tehtävät Johannes Verwijnen January 8, 05 edx-tehtävät Vastauksissa on käytetty edx-kurssin materiaalia.. Problem The inner product of + and is. Edelleen false, kts. viikon tehtävä 6..

Lisätiedot

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä OULUN YLIOPISTO Tietojenkäsittelytieteiden laitos Johdatus ohjelmointiin 81122P (4 ov.) 30.5.2005 Ohjelmointikieli on Java. Tentissä saa olla materiaali mukana. Tenttitulokset julkaistaan aikaisintaan

Lisätiedot

21~--~--~r--1~~--~--~~r--1~

21~--~--~r--1~~--~--~~r--1~ - K.Loberg FYSE420 DIGITAL ELECTRONICS 13.05.2011 1. Toteuta alla esitetyn sekvenssin tuottava asynkroninen pun. Anna heratefunktiot, siirtotaulukko ja kokonaistilataulukko ( exitation functions, transition

Lisätiedot

HARJOITUS- PAKETTI A

HARJOITUS- PAKETTI A Logistiikka A35A00310 Tuotantotalouden perusteet HARJOITUS- PAKETTI A (6 pistettä) TUTA 19 Luento 3.Ennustaminen County General 1 piste The number of heart surgeries performed at County General Hospital

Lisätiedot

The Viking Battle - Part Version: Finnish

The Viking Battle - Part Version: Finnish The Viking Battle - Part 1 015 Version: Finnish Tehtävä 1 Olkoon kokonaisluku, ja olkoon A n joukko A n = { n k k Z, 0 k < n}. Selvitä suurin kokonaisluku M n, jota ei voi kirjoittaa yhden tai useamman

Lisätiedot

Methods S1. Sequences relevant to the constructed strains, Related to Figures 1-6.

Methods S1. Sequences relevant to the constructed strains, Related to Figures 1-6. Methods S1. Sequences relevant to the constructed strains, Related to Figures 1-6. A. Promoter Sequences Gal4 binding sites are highlighted in the color referenced in Figure 1A when possible. Site 1: red,

Lisätiedot

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007 National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007 Chapter 2.4 Jukka Räisä 1 WATER PIPES PLACEMENT 2.4.1 Regulation Water pipe and its

Lisätiedot

You can check above like this: Start->Control Panel->Programs->find if Microsoft Lync or Microsoft Lync Attendeed is listed

You can check above like this: Start->Control Panel->Programs->find if Microsoft Lync or Microsoft Lync Attendeed is listed Online Meeting Guest Online Meeting for Guest Participant Lync Attendee Installation Online kokous vierailevalle osallistujalle Lync Attendee Asennus www.ruukki.com Overview Before you can join to Ruukki

Lisätiedot

FinFamily Installation and importing data (11.1.2016) FinFamily Asennus / Installation

FinFamily Installation and importing data (11.1.2016) FinFamily Asennus / Installation FinFamily Asennus / Installation 1 Sisällys / Contents FinFamily Asennus / Installation... 1 1. Asennus ja tietojen tuonti / Installation and importing data... 4 1.1. Asenna Java / Install Java... 4 1.2.

Lisätiedot

Topologies on pseudoinnite paths

Topologies on pseudoinnite paths Topologies on pseudoinnite paths Andrey Kudinov Institute for Information Transmission Problems, Moscow National Research University Higher School of Economics, Moscow Moscow Institute of Physics and Technology

Lisätiedot

Increase of opioid use in Finland when is there enough key indicator data to state a trend?

Increase of opioid use in Finland when is there enough key indicator data to state a trend? Increase of opioid use in Finland when is there enough key indicator data to state a trend? Martta Forsell, Finnish Focal Point 28.9.2015 Esityksen nimi / Tekijä 1 Martta Forsell Master of Social Sciences

Lisätiedot

Tarua vai totta: sähkön vähittäismarkkina ei toimi? 11.2.2015 Satu Viljainen Professori, sähkömarkkinat

Tarua vai totta: sähkön vähittäismarkkina ei toimi? 11.2.2015 Satu Viljainen Professori, sähkömarkkinat Tarua vai totta: sähkön vähittäismarkkina ei toimi? 11.2.2015 Satu Viljainen Professori, sähkömarkkinat Esityksen sisältö: 1. EU:n energiapolitiikka on se, joka ei toimi 2. Mihin perustuu väite, etteivät

Lisätiedot

Searching (Sub-)Strings. Ulf Leser

Searching (Sub-)Strings. Ulf Leser Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf

Lisätiedot

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine 4.1.2018 Centre for Language and Communication Studies Puhutko suomea? -Hei! -Hei hei! -Moi! -Moi moi! -Terve! -Terve

Lisätiedot

Paikkatiedon semanttinen mallinnus, integrointi ja julkaiseminen Case Suomalainen ajallinen paikkaontologia SAPO

Paikkatiedon semanttinen mallinnus, integrointi ja julkaiseminen Case Suomalainen ajallinen paikkaontologia SAPO Paikkatiedon semanttinen mallinnus, integrointi ja julkaiseminen Case Suomalainen ajallinen paikkaontologia SAPO Tomi Kauppinen, Eero Hyvönen, Jari Väätäinen Semantic Computing Research Group (SeCo) http://www.seco.tkk.fi/

Lisätiedot

VAASAN YLIOPISTO Humanististen tieteiden kandidaatin tutkinto / Filosofian maisterin tutkinto

VAASAN YLIOPISTO Humanististen tieteiden kandidaatin tutkinto / Filosofian maisterin tutkinto VAASAN YLIOPISTO Humanististen tieteiden kandidaatin tutkinto / Filosofian maisterin tutkinto Tämän viestinnän, nykysuomen ja englannin kandidaattiohjelman valintakokeen avulla Arvioidaan viestintävalmiuksia,

Lisätiedot

FIS IMATRAN KYLPYLÄHIIHDOT Team captains meeting

FIS IMATRAN KYLPYLÄHIIHDOT Team captains meeting FIS IMATRAN KYLPYLÄHIIHDOT 8.-9.12.2018 Team captains meeting 8.12.2018 Agenda 1 Opening of the meeting 2 Presence 3 Organizer s personell 4 Jury 5 Weather forecast 6 Composition of competitors startlists

Lisätiedot

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ https://community.plm.automation.siemens.com/t5/tech-tips- Knowledge-Base-NX/How-to-simulate-any-G-code-file-in-NX- CAM/ta-p/3340 Koneistusympäristön määrittely

Lisätiedot

SIMULINK S-funktiot. SIMULINK S-funktiot

SIMULINK S-funktiot. SIMULINK S-funktiot S-funktio on ohjelmointikielellä (Matlab, C, Fortran) laadittu oma algoritmi tai dynaamisen järjestelmän kuvaus, jota voidaan käyttää Simulink-malleissa kuin mitä tahansa valmista lohkoa. S-funktion rakenne

Lisätiedot