Genome 373: Genomic Informatics. Professors Elhanan Borenstein and Jay Shendure

Samankaltaiset tiedostot
On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Capacity Utilization

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

1.3 Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

Choose Finland-Helsinki Valitse Finland-Helsinki

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

Kysymys 5 Compared to the workload, the number of credits awarded was (1 credits equals 27 working hours): (4)

Information on preparing Presentation

Efficiency change over time

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

1. Liikkuvat määreet

Network to Get Work. Tehtäviä opiskelijoille Assignments for students.

AYYE 9/ HOUSING POLICY

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana

Information on Finnish Courses Autumn Semester 2017 Jenni Laine & Päivi Paukku Centre for Language and Communication Studies

Information on Finnish Language Courses Spring Semester 2017 Jenni Laine

812336A C++ -kielen perusteet,

You can check above like this: Start->Control Panel->Programs->find if Microsoft Lync or Microsoft Lync Attendeed is listed

16. Allocation Models

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

Results on the new polydrug use questions in the Finnish TDI data

The CCR Model and Production Correspondence

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

S SÄHKÖTEKNIIKKA JA ELEKTRONIIKKA

TU-C2030 Operations Management Project. Introduction lecture November 2nd, 2016 Lotta Lundell, Rinna Toikka, Timo Seppälä

Curriculum. Gym card

7. Product-line architectures

Sisällysluettelo Table of contents

Expression of interest

Bioinformatics in Laboratory of Computer and Information Science

OP1. PreDP StudyPlan

Statistical design. Tuomas Selander

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies

make and make and make ThinkMath 2017

Skene. Games Refueled. Muokkaa perustyyl. for Health, Kuopio

EUROOPAN PARLAMENTTI

7.4 Variability management

Voice Over LTE (VoLTE) By Miikka Poikselkä;Harri Holma;Jukka Hongisto

TIETEEN PÄIVÄT OULUSSA

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) ( (Finnish Edition)

Teacher's Professional Role in the Finnish Education System Katriina Maaranen Ph.D. Faculty of Educational Sciences University of Helsinki, Finland

MUSEOT KULTTUURIPALVELUINA

FETAL FIBROBLASTS, PASSAGE 10

Additions, deletions and changes to courses for the academic year Mitä vanhoja kursseja uusi korvaa / kommentit

ELEC-C5230 Digitaalisen signaalinkäsittelyn perusteet

CS284A Representations & Algorithms for Molecular Biology. Xiaohui S. Xie University of California, Irvine

VAASAN YLIOPISTO Humanististen tieteiden kandidaatin tutkinto / Filosofian maisterin tutkinto

Basic Flute Technique

Miksi Suomi on Suomi (Finnish Edition)

Salasanan vaihto uuteen / How to change password

MALE ADULT FIBROBLAST LINE (82-6hTERT)

Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1.

Microsoft Lync 2010 Attendee

Osallistujaraportit Erasmus+ ammatillinen koulutus

Bachelor level exams by date in Otaniemi

Bachelor level exams by subject in Otaniemi

Miksi kotikansainvälisyys? Kansainvälinen yliopisto opiskelijanäkökulmasta Milla Ovaska Asiantuntija, kansainväliset asiat Aalto-yliopiston

Security server v6 installation requirements

Opiskelijat valtaan! TOPIC MASTER menetelmä lukion englannin opetuksessa. Tuija Kae, englannin kielen lehtori Sotungin lukio ja etälukio

Capacity utilization

Travel Getting Around

Rekisteröiminen - FAQ

OFFICE 365 OPISKELIJOILLE

Green Growth Sessio - Millaisilla kansainvälistymismalleilla kasvumarkkinoille?

While we compile backlinks report, You can visit following handy links. Music download

Security server v6 installation requirements

Business Opening. Arvoisa Herra Presidentti Very formal, recipient has a special title that must be used in place of their name

SELL Student Games kansainvälinen opiskelijaurheilutapahtuma

Immigration Studying. Studying - University. Stating that you want to enroll. Stating that you want to apply for a course.

Recommended background: Structural Engineering I and II

FinFamily Installation and importing data ( ) FinFamily Asennus / Installation

Hankkeen toiminnot työsuunnitelman laatiminen

Windows Phone. Module Descriptions. Opiframe Oy puh Espoo

Hankkeiden vaikuttavuus: Työkaluja hankesuunnittelun tueksi

Esimerkkinä - ilmainen blogi-julkaisujärjestelmä. WordPress:stä on myös palvelimelle asennettava versio (WordPress.

Opiskelijoiden ajatuksia koulun alkuun liittyen / students thoughts about the beginning of their studies at KSYK

Vertaispalaute. Vertaispalaute, /9

Other approaches to restrict multipliers

Travel Accommodations

Students Experiences of Workplace Learning Marja Samppala, Med, doctoral student

Miehittämätön meriliikenne

1. Gender - Sukupuoli N = Age - Ikä N = 65. Female Nainen. Male Mies

HARJOITUS- PAKETTI A

Counting quantities 1-3

Kuvankäsi/ely. Vieraana Jorma Laaksonen Tietotekniikan laitos. Viikko Luento Ope-ajat Harjoitus 7: Tietoliikenteen signaalinkäsi/ely

BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT

S SÄHKÖTEKNIIKKA JA ELEKTRONIIKKA

ESITTELY. Valitse oppilas jonka haluaisit esitellä luokallesi ja täytä alla oleva kysely. Age Grade Getting to school. School day.

anna minun kertoa let me tell you

Plasmid Name: pmm290. Aliases: none known. Length: bp. Constructed by: Mike Moser/Cristina Swanson. Last updated: 17 August 2009

Uusi Ajatus Löytyy Luonnosta 3 (Finnish Edition)

EVALUATION FOR THE ERASMUS+-PROJECT, STUDENTSE

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL

State of the Union... Functional Genomics Research Stream. Molecular Biology. Genomics. Computational Biology

Operatioanalyysi 2011, Harjoitus 4, viikko 40

Data protection template

BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET.

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

Transkriptio:

Genome 373: Genomic Informatics Professors Elhanan Borenstein and Jay Shendure

Genome 373 This course is intended to introduce students to the breadth of problems and methods in computational analysis of genomes, arguably the single most important new area in biological research. The specific subjects will include: Sequence alignment Sequencing and next generation sequencing Gene prediction Micro-array analysis Evolutionary relationships and phylogeny Biological systems

Outline Course logistics Introduction to Bioinformatics Introduction to Python

Instructors Elhanan Borenstein: Weeks 1-2, 8-10 Jay Shendure: Weeks 3-7 Office hours: Monday 11:20-12:00 Jacob Kitzman(TA) will teach additional topics including programming and problem solving skills. Material covered in section is required, and will be on the exams.

Webpage Web site: http://elbo.gs.washington.edu/courses/gs_373_12_sp/ Page has links to Lecture notes Handouts Homework assignments Many useful resources on: Bioinformatics Python

Programming Note: Historically, this course required prior programming experience. The first couple of weeks in class and the first few weeks of section will focus on learning to program in Python. If you do not have any programming experience, that s ok, but you will need to work hard to catch up.

Grading 50% homework 20% midterm exam (in class) 30% final exam, Mon, June 4 Final exam is cumulative.

Homework Posted online each Wednesday and due the following Wednesday. Homework is a mix of written problems and programming. Homework assignments are to be submitted by email! Programming assignments should be implemented in Python. For other languages, please ask Jacob. More on home assignment submission in the quiz section.

Textbooks

Background survey Please write on the index card your: 1. Name and email 2. Major 3. Primarybackground (biology, computation, other) 4. Programming experience (how much, what language) 5. Registered/not-registered/waiting-list

Why Bioinformatics?

tgcaagcatgcacatgtaccaggagaaaatgaagacaattgtggaaacttttagacttttcatcaactttctagtgtcacttttttgccgctttcct atctgatagttgcgaagactccgaagaaaatgagaatggtgaaggctagcatgctgatgcttcatttctctggagcaattgtggatttctatctaag cttcatttcgatcccagtgctcactttgcccgtttgctcaggtatccattgggattctcgttggtgttaggaattccaacgtctgttcaagtttata tcggagtttcatgtatgggcggtgggtcgctctgttgcaggaggtcttgaatttcttttttgcagtaatcggtgtaactattcttatatttttcgaa aatcgttactttcaactaatcaatggatcttctggtggtagaagttggaagcgaaaactatatgttttgtgtaattacgcgttctctgtaactttta tagctccagcgtttttagacatttttagtgaagaacaaggaagagcgtgcacgtttgaagtaagttaggcaaaccaaactcgctagtgtgatgaaat tttccagaaaattccgagtatccctatcgacgtgccttctcgctcaggatattttgtcctattaattgataacccagtctacagcatttgcgtaagc ctcttggtaattaaagtgtgcccacaaattggtatagtcgttttgttcatattcccttatattgttcaaacgaaatcacattctcgagccacacttc gtttacttcttcacttttttatcgcgatgtgtatccagctgtctattccatttttggtcatcttcttgccggctgcttttatagtgtacgcaattca atatgactattataatcaaggtatgaatattaggccttccacgaaggcgctattctcgcccgcccgtaccacaccaacgctcttctcagttgcacgc ggctatagtagcgcgagggcccgcgtagcgtcggccgccttcatagaaggtctaatgaatatatagtattaagtataatttaaataaagtttcagca gcaaacaacttggcgatggcaacaatggcattccatggggtatgtactacactgaccatgatcatcgtgcatacaccgtatcgtaacgctactttga gcattttacatctgaaatcggaaaaatcggcaaaaacagtgactgattcgaagattgtgtggaaaagtaacaagggagtacagatgacataaactat gcccattgttaccctatattttatttttctctatggtgacaactttatcttaagaaaaacacgcatataaatcaagcagttcctggtcacaggacgt ttacttccacctgtttctaatttcttataaaaccctatatctttcaagttttttccacaagactctgccactctgacacttatgtgctcgactagcc tcagcttctttgcttccgagcaaacatatataaaacttctacatactcttaccatacttgaactttccactcactcttttggagcatacatcatcat tacaaaaacaccgaaaaagttggaatccgtgaaggccagcatgctctatctacaatttgttggagcatttgtcgatgtctatttcagttggttagct atgccgattctagtactacctttatgtgcaggacatgcgattggcttactttcattttttggggttccaagctcgttgcaagtttatgtaggtttct gttcactagcaggttggttcttaagaatgatggagagcgtcacatgtattgtgttgtacagatacaatttgaaagcaatccaatacagcgtgtaaaa gttttgcaattataaacatcattgcagttatggttatgacagtagtgatctttctggaagatcgtcgatatcggttggtgaacggtcaaaagtcaaa caaaatgagaaaattgtatcggttactgtttgtcacagctaattatgtttatgctacattgtaccctgctcccatatactttttgcttcccgaccaa gaatatggaagaattttatcgaaaagtgtacgtcttaaaaagtttgaaacatatacaatgaaatgtcttacttttaaagtttgcgtttcagaaaaat ccgtgtattccgaacgaatatttaaaccatcctaatttctttttgcttgatctcgatggaaagtatacttcaatttgtatcctgcttatgttgagtt ctctggtctctcaaatgttttggcaaattggactgattttccgtcagatgctcaaaaatccgtccgtttctcaaaatacgcaccgactacaatacca gtttttaattgcaatgagcttgcaaggcaccattccaatgattatcattgtttttccagcttttttctatgttgtctcaattatgttaaattatcat aatcaaggtattgtatctattcggaacaagacattaaacataattccaacttttcaggtgcaaataacttatcgtttcttatcatttccatgcatgg agttctatcaacgttgacaatgctcatggcacacagaccgtatagacaatcgattgtcaaaatgttgaatctgaatttcaataaggcaggtggtggt gttcaacgtatttggacgctttccagaagaaataattaatgatgaccttggaaaaggctaatcttcacaacaatcaaatcaaataatcataaaagtt tttattgaagaaaaataaactatctgtgcacagaaatccaatgaattgctctatctacaatttgttggagcatttgtcgatgtctatttcagttggt tagctatgccgattctagtactacctttatgtgcaggacatgcgattggcttactttcattttttggggttccaagctcgttgcaagtttatgtagg tttctgttcactagcaggttggttcttaagaatgatggagagcgtcacatgtattgtgttgtacagatacaatttgaaagcaatccaatacagcgtg taaaagttttgcaattataaacatcattgcagttatggttatgacagtagtgatctttctggaagatcgtcgatatcggttggtgaacggtcaaaag Find the binding sequence: caattatgttaaa

tgcaagcatgcacatgtaccaggagaaaatgaagacaattgtggaaacttttagacttttcatcaactttctagtgtcacttttttgccgctttcct atctgatagttgcgaagactccgaagaaaatgagaatggtgaaggctagcatgctgatgcttcatttctctggagcaattgtggatttctatctaag cttcatttcgatcccagtgctcactttgcccgtttgctcaggtatccattgggattctcgttggtgttaggaattccaacgtctgttcaagtttata tcggagtttcatgtatgggcggtgggtcgctctgttgcaggaggtcttgaatttcttttttgcagtaatcggtgtaactattcttatatttttcgaa aatcgttactttcaactaatcaatggatcttctggtggtagaagttggaagcgaaaactatatgttttgtgtaattacgcgttctctgtaactttta tagctccagcgtttttagacatttttagtgaagaacaaggaagagcgtgcacgtttgaagtaagttaggcaaaccaaactcgctagtgtgatgaaat tttccagaaaattccgagtatccctatcgacgtgccttctcgctcaggatattttgtcctattaattgataacccagtctacagcatttgcgtaagc ctcttggtaattaaagtgtgcccacaaattggtatagtcgttttgttcatattcccttatattgttcaaacgaaatcacattctcgagccacacttc gtttacttcttcacttttttatcgcgatgtgtatccagctgtctattccatttttggtcatcttcttgccggctgcttttatagtgtacgcaattca atatgactattataatcaaggtatgaatattaggccttccacgaaggcgctattctcgcccgcccgtaccacaccaacgctcttctcagttgcacgc ggctatagtagcgcgagggcccgcgtagcgtcggccgccttcatagaaggtctaatgaatatatagtattaagtataatttaaataaagtttcagca gcaaacaacttggcgatggcaacaatggcattccatggggtatgtactacactgaccatgatcatcgtgcatacaccgtatcgtaacgctactttga gcattttacatctgaaatcggaaaaatcggcaaaaacagtgactgattcgaagattgtgtggaaaagtaacaagggagtacagatgacataaactat gcccattgttaccctatattttatttttctctatggtgacaactttatcttaagaaaaacacgcatataaatcaagcagttcctggtcacaggacgt ttacttccacctgtttctaatttcttataaaaccctatatctttcaagttttttccacaagactctgccactctgacacttatgtgctcgactagcc tcagcttctttgcttccgagcaaacatatataaaacttctacatactcttaccatacttgaactttccactcactcttttggagcatacatcatcat tacaaaaacaccgaaaaagttggaatccgtgaaggccagcatgctctatctacaatttgttggagcatttgtcgatgtctatttcagttggttagct atgccgattctagtactacctttatgtgcaggacatgcgattggcttactttcattttttggggttccaagctcgttgcaagtttatgtaggtttct gttcactagcaggttggttcttaagaatgatggagagcgtcacatgtattgtgttgtacagatacaatttgaaagcaatccaatacagcgtgtaaaa gttttgcaattataaacatcattgcagttatggttatgacagtagtgatctttctggaagatcgtcgatatcggttggtgaacggtcaaaagtcaaa caaaatgagaaaattgtatcggttactgtttgtcacagctaattatgtttatgctacattgtaccctgctcccatatactttttgcttcccgaccaa gaatatggaagaattttatcgaaaagtgtacgtcttaaaaagtttgaaacatatacaatgaaatgtcttacttttaaagtttgcgtttcagaaaaat ccgtgtattccgaacgaatatttaaaccatcctaatttctttttgcttgatctcgatggaaagtatacttcaatttgtatcctgcttatgttgagtt ctctggtctctcaaatgttttggcaaattggactgattttccgtcagatgctcaaaaatccgtccgtttctcaaaatacgcaccgactacaatacca gtttttaattgcaatgagcttgcaaggcaccattccaatgattatcattgtttttccagcttttttctatgttgtctcaattatgttaaattatcat aatcaaggtattgtatctattcggaacaagacattaaacataattccaacttttcaggtgcaaataacttatcgtttcttatcatttccatgcatgg agttctatcaacgttgacaatgctcatggcacacagaccgtatagacaatcgattgtcaaaatgttgaatctgaatttcaataaggcaggtggtggt gttcaacgtatttggacgctttccagaagaaataattaatgatgaccttggaaaaggctaatcttcacaacaatcaaatcaaataatcataaaagtt tttattgaagaaaaataaactatctgtgcacagaaatccaatgaattgctctatctacaatttgttggagcatttgtcgatgtctatttcagttggt tagctatgccgattctagtactacctttatgtgcaggacatgcgattggcttactttcattttttggggttccaagctcgttgcaagtttatgtagg tttctgttcactagcaggttggttcttaagaatgatggagagcgtcacatgtattgtgttgtacagatacaatttgaaagcaatccaatacagcgtg taaaagttttgcaattataaacatcattgcagttatggttatgacagtagtgatctttctggaagatcgtcgatatcggttggtgaacggtcaaaag Find the binding sequence: caattatgttaaa

Computer processing power doubles every ~2 years. Moore s law dotted line - 2 year doubling

Sequencing cost decreasing much fasterthan computing cost >2-fold drop per year? - changing so fast hard to be specific

Viruses Bacteria Archaea Fungi Animals Plants ~3-1200 Kb ~1-5 Mb ~1-5 Mb ~10-50 Mb ~100-5,000 Mb ~100-10,000 Mb As of 2011 done or nearly done > 2,000 viruses > 1,000 bacteria and archaea Hundreds of fungi Dozens of protists Dozens of nematodes and insects 6 fish, 1 reptile, 4 birds, 1 amphibian About 10 plants About 40 mammals (+multiple individuals) Microbial communities (e.g., human microbiome) ~3100 bases (3.1Kb)

A computational bottleneck

Find the binding sequence: caattatgttaaa allowing for one mutation and one insertion gaattatgttaaa catttatgttaaa caatt-atgttaaa cagttat-gttaaa cagttatgttaaa caattatgtta-aa caattatattaaa cagttatgttaa-a caattatgttaat caattatgt-taaa caattatgttaaa caaatatgttaaa caat-tatgttaaa ca-attatggtaaa caattatgttaaa c-aattatgttata caattatgttaga

How well can the string GAATTCAGTTA match the string GGATCGA? (what is the best alignment between the two strings?) G A A T T C A G T T A G G A T C G - - A

InformaticsChallenges: Examples Sequence comparison: Find the best match (alignment) of a given sequence in a large dataset of sequences Find the best alignment of two sequences Find the best alignment of multiple sequences Motif and gene finding Relationship between sequences Phylogeny Clustering and classification Many manymanymore