Eukaryotic Comparative Genomics

Samankaltaiset tiedostot
Methods S1. Sequences relevant to the constructed strains, Related to Figures 1-6.

Capacity Utilization

lpar1 IPB004065, IPB002277, and IPB Restriction Enyzme Differences from REBASE Gained in Variant Lost from Reference

Efficiency change over time

Plasmid Name: pmm290. Aliases: none known. Length: bp. Constructed by: Mike Moser/Cristina Swanson. Last updated: 17 August 2009

FETAL FIBROBLASTS, PASSAGE 10

Experimental Identification and Computational Characterization of a Novel. Extracellular Metalloproteinase Produced by Clostridium sordellii

MALE ADULT FIBROBLAST LINE (82-6hTERT)

Functional Genomics & Proteomics

State of the Union... Functional Genomics Research Stream. Molecular Biology. Genomics. Computational Biology

Other approaches to restrict multipliers

Gap-filling methods for CH 4 data

16. Allocation Models

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

make and make and make ThinkMath 2017

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Enseigner l'évolution

EVALUATION FOR THE ERASMUS+-PROJECT, STUDENTSE

anna minun kertoa let me tell you

The CCR Model and Production Correspondence

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. David R. Kelley

T Statistical Natural Language Processing Answers 6 Collocations Version 1.0

Choose Finland-Helsinki Valitse Finland-Helsinki

Viral DNA as a model for coil to globule transition

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Metsälamminkankaan tuulivoimapuiston osayleiskaava

TM ETRS-TM35FIN-ETRS89 WTG

Bounds on non-surjective cellular automata

Alueellinen yhteistoiminta

( ( OX2 Perkkiö. Rakennuskanta. Varjostus. 9 x N131 x HH145

WindPRO version joulu 2012 Printed/Page :42 / 1. SHADOW - Main Result

Tynnyrivaara, OX2 Tuulivoimahanke. ( Layout 9 x N131 x HH145. Rakennukset Asuinrakennus Lomarakennus 9 x N131 x HH145 Varjostus 1 h/a 8 h/a 20 h/a

( ,5 1 1,5 2 km

TM ETRS-TM35FIN-ETRS89 WTG

Predicting evolutionarily conserved regions (ECRs) in the Xenopus tropicalis genome using a MultiPipMaker-based bioinformatic strategy

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ

TM ETRS-TM35FIN-ETRS89 WTG

WindPRO version joulu 2012 Printed/Page :47 / 1. SHADOW - Main Result

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

FinFamily PostgreSQL installation ( ) FinFamily PostgreSQL

TM ETRS-TM35FIN-ETRS89 WTG

rapid evolution in copy number, location and sequence, with diverse turnover mechanisms.

TM ETRS-TM35FIN-ETRS89 WTG

The Viking Battle - Part Version: Finnish

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

C++11 seminaari, kevät Johannes Koskinen

tgg agg Supplementary Figure S1.

EUROOPAN PARLAMENTTI

BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT

,0 Yes ,0 120, ,8

VIIKKI BIOCENTER University of Helsinki

Ó Ó Ó

Results on the new polydrug use questions in the Finnish TDI data

Genome 373: Genomic Informatics. Professors Elhanan Borenstein and Jay Shendure

1.3Lohkorakenne muodostetaan käyttämällä a) puolipistettä b) aaltosulkeita c) BEGIN ja END lausekkeita d) sisennystä

A DEA Game II. Juha Saloheimo S ysteemianalyysin. Laboratorio. Teknillinen korkeakoulu

AYYE 9/ HOUSING POLICY

Network to Get Work. Tehtäviä opiskelijoille Assignments for students.

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Uusi Ajatus Löytyy Luonnosta 3 (Finnish Edition)

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward.

National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

You can check above like this: Start->Control Panel->Programs->find if Microsoft Lync or Microsoft Lync Attendeed is listed

Alternative DEA Models

CS284A Representations & Algorithms for Molecular Biology. Xiaohui S. Xie University of California, Irvine

Rakennukset Varjostus "real case" h/a 0,5 1,5

Information on Finnish Courses Autumn Semester 2017 Jenni Laine & Päivi Paukku Centre for Language and Communication Studies

7. Product-line architectures

TM ETRS-TM35FIN-ETRS89 WTG

TM ETRS-TM35FIN-ETRS89 WTG

OP1. PreDP StudyPlan

VIIKKI BIOCENTER University of Helsinki

Statistical design. Tuomas Selander

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

6.095/ Computational Biology: Genomes, Networks, Evolution. Sequence Alignment and Dynamic Programming

RANTALA SARI: Sairaanhoitajan eettisten ohjeiden tunnettavuus ja niiden käyttö hoitotyön tukena sisätautien vuodeosastolla

Information on preparing Presentation

Kaivostoiminnan eri vaiheiden kumulatiivisten vaikutusten huomioimisen kehittäminen suomalaisessa luonnonsuojelulainsäädännössä

I. Principles of Pointer Year Analysis

Keskeisiä näkökulmia RCE-verkoston rakentamisessa Central viewpoints to consider when constructing RCE

Suihkukoneet 1:73 ja pienemmät. Potkurikoneet 1:72-1:49. Suihkukoneet 1:72-1:49. Potkurikoneet 1:35 ja suuremmat. Suihkukoneet 1:35 ja suuremmat

Information on Finnish Language Courses Spring Semester 2018 Päivi Paukku & Jenni Laine Centre for Language and Communication Studies

Information on Finnish Language Courses Spring Semester 2017 Jenni Laine

ECVETin soveltuvuus suomalaisiin tutkinnon perusteisiin. Case:Yrittäjyyskurssi matkailualan opiskelijoille englantilaisen opettajan toteuttamana

812336A C++ -kielen perusteet,

SIMULINK S-funktiot. SIMULINK S-funktiot

Nuku hyvin, pieni susi -????????????,?????????????????. Kaksikielinen satukirja (suomi - venäjä) ( (Finnish Edition)

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Ohjelmointikielet ja -paradigmat 5op. Markus Norrena

Green Growth Sessio - Millaisilla kansainvälistymismalleilla kasvumarkkinoille?

Data Quality Master Data Management

( N117 x HH141 ( Honkajoki N117 x 9 x HH120 tv-alueet ( ( ( ( ( ( ( ( ( ( m. Honkajoki & Kankaanpää tuulivoimahankkeet

Naisnäkökulma sijoittamiseen Vesa Puttonen

4x4cup Rastikuvien tulkinta

Python Libraries 1 / 14

Transkriptio:

Eukaryotic Comparative Genomics

Detecting Conserved Sequences Charles Darwin Motoo Kimura

Evolution of Neutral DNA A A T C TA AT T G CT G T GA T T C A GA G T A G CA G T GA AT A GT C T T T GA T GT T G T T GC A G GA GT A GT C G T A * * * * * * * * * * * * * * * * * * * * * * * * *

Evolution of Non-Neutral DNA A CT T AG T C CG A T G T G CG T A C C G A C C A T A AG G A TG AC C A * C GT A T AC C A T G T G G T A TC C G AT C C A T A A G CA T A CT * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Multi-Species Alignment ATGTGGCGCAGCCTGTGCCAGCTGGACGATCGA ATGTAGCCTAGCCAGTGCCAGCTGGACGATCGA GTACATCGATAGCTTAGAATGCTGGACGATCTC GTACGTCGATAGCATAGAATGCTGGACGATCTC * * * * ***********

How to do Comparative Genomics 1. Choose species to analyze 2. Align sequences 3. Identify streches of highly conserved nucleotides

Choose species closely related species distantly related species Closely Related Species align well not many changes Distantly Related Species hard to align lots of changes

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Case Study: Coding vs.non-coding ATG. ORF TAA Non-Coding DNA -regulatory functions -short (5-15 bp) -degenerate -variable spacing Coding DNA -codes for protein -triplet code -open reading frame (ORF) -tend to be long (50-500 bp) -highly constrained

CASE 1: Non-Coding ATG GAL4 TAA

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Closely-related sequences are uninformative ATG GAL4 paradoxus TCTTCTGAGACAGCATCACTTCTTCTTNTTTTTTACATAACTTATTCTTCTATAATTTTC cerevisiae TCCTTTGAGACAGCATTCGCCCAGTATTTTTTTTATTCTACA-AACCTTCTATAATTT-C ** * *********** * * ******* ** * ************ * paradoxus AACGTATTTACATAGTTCTGTATCAGTTTAATCACCATAATATTGTTTTCCCTCAACTAA cerevisiae AAAGTATTTACATAATTCTGTATCAGTTTAATCACCATAATATCGTTTTCT-----TTGT ** *********** **************************** ****** * paradoxus TGAATGCAATTAGATTTTCTTATTGTTCCCTCGCGGCTTTTTTTTGTTTTATAATCTATT cerevisiae TTAGTGCAATTAATTTTTCCTATTGTTACTTCG-GGCCTTTTTCTGTTTTATGAGCTATT * * ******** ***** ******* * *** *** ***** ******** * ***** paradoxus TTTTCCGTCATTTCTTCCCCAGATTTCCAACTTCATCTCCAGATTGTGTCTATGTAATGC cerevisiae TTTTCCGTCATC-CTTCCCCAGATTTTCAGCTTCATCTCCAGATTGTGTCTACGTAATGC *********** ************* ** ********************** ******* paradoxus ATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGCTACTGTCT cerevisiae ACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGCTACTGTCT * ** ***** ** *** * ** ****** *** ********** ***************

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Distantly-related sequences do not align ATG GAL4 Noncoding (Promoter) cerevisiae ACTTACCAT-CAAC-CATAGATGGGTAAAC---GGTTAGTAACTAGGAACACGAT castelli AGA-GTCAAACTTTTCGT ATA--TATATATAATATGTCTGATTGCTGGTT---T * ** * * * * * * * * *

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Multiple sequence alignments reveal conserved elements cerevisiae TGAGACAGCAT-CACTTCTT-CTTNTTTTTTACATAACTTATTCTTCTATAATTTTCAAC mikatae Bayanus TGAGACAGCATTCACTTCTTTCTTTTTTTTTACATATCTTATTCTTCTATAATTTTCAAC TGAGACAGCATTCGCCCAGT--ATTTTTTTTAT-TCTACAAACCTTCTATAATTT-CAAA kudriadzevi TGAGACTGCACTCCC--------TCTTCCTTTC------------TCCATAACTT---AC ****** *** * * * ** ** ** **** ** * paradoxus kluyveri cerevisiae bayanus UAS1 ATG UAS2 GAL4 GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC GTATTTACATAATTCTGTATCAGTTTAATCACCATAAT------ATCGTTTTCTTTGT-- TTATTTACATAGTTTTGTATCAGTTTAATCACCATAATCGTAACACCGTTTTACCTCACC ********** ** *********************** * ***** * paradoxus kluyveri cerevisiae bayanus paradoxus kluyveri cerevisiae bayanus paradoxus kluyveri cerevisiae bayanus TAATGAATGCAATTAGATTTTC-TTATTGTTCCC-TCGCGGCTTTTTTTTGTTTTATAAT TAATGAATGCAATTAGATTTTCCTTATTGTTCCCCTCGCGGCTTTTTTTTGTTTTATAAT ---TTAGTGCAATTAATTTTTC-CTATTGTTACT-TCG-GGCCTTTTTCTGTTTTATGAG TGATGCGGG--A---ATCCTTC-AGACCGTTCTC-TCGCGC------------------- * * * *** * *** *** * UES MIG1 MIG1 -CTATTTTTTCCGTCATTTCTTCCCC-AGATTTCCAACTTCAT-CTCCAGATTGTGTCTA ACTATTTTTTCCGTCATTTCTTCCCCCAGATTTCCAACTTCATACTCCAGATTGTGTCTA -CTATTTTTTCCGTCATC-CTTCCCC-AGATTTTCAGCTTCAT-CTCCAGATTGTGTCTA -CTTTTTTTTTCGTCATTTCTTCCCC-AGATCTACAACTTTAA-CTCCAGACGGTGTATA ** ****** ****** ******* **** * ** *** * ******* **** ** TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC CGTAATGCACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGC GGCAGTACAAGCAGTGCTTTTGGGAAGAGGCAAAGCTGCAGACCTCGAGAACAATGAAGC * * * ** ** * * ** ** * * ** ** **** *** *******

CASE 2: Coding ATG CLN3 TAA

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Closely-related sequences are uninformative

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Less distantly related species not informative either

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Distanly related species reveal functional protein domains

Identification of Multi-Species Conserved Regions (MCS) Human Chimp Mouse Rat Dog cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct ttcagtcgtttcccagtgtctctga-cattcagagactactttagtaagcattt-tctct tcagtccttccctggcatctccag-cactcaa-gactactttagtaagcattt-tctctg tcaatgactttcccagtctcttctactgggaagagattaggttgcaaatcatttttctct * * * * * * ** How can we decide if this region in conserved? Margulies et al (2003) Gen. Res. 13:2507-18

Binomial-Based Method for Detection of MCS Human: AATGG Mouse: AATCG Status: CCCDC p = chance that a site is the same between human and mouse, q = 1-p For an alignment N base pairs long with n identities calculate the cumulative binomial probability as: P ( X ³ n) = N å i= n p i q N -i æ ç è N i ö ø Margulies et al (2003) Gen. Res. 13:2507-18

How to score human-mouse conservation? score = M - σ µ 1) Look at 50 bp windows that align 2) M is the number of identical bases in a particular 50 bp alignment 3) µ is the average number of identical residues in 50 bp alignments of local ancient, syntenic repeats (neutral) 4) s is the standard deviation of µ Nature (2002) 420: 520-62

5% Conserved between Human-Mouse Red = neutral Blue = observed genomic Gray = estimated selection (20% of windows under selection)(25% of bp in alignments) = 5% Nature (2002) 420: 520-62

What does 5% conservation mean? Only 1.5% of the genome is coding sequence 5 UTRs, 3 UTRs, promoters, and introns do not make up the difference

Problem with resolution Answer: Sequence more genomes (maybe)! Eddy 2005: Binomial model for power calculations

Tree Topology Influences Power Star Phylogeny Actual Phylogeny species A species F species B species E species C species D

Ultraconserved Sequences 481 sequences longer than 200 bp are 100% identical between orthologous regions of human, mouse, and rat Most conserved at 99% in chicken and dog too 5000 sequences longer than 100 bp are 100% identical in these species Bejerano et al (2004) Science 304: 1321-1325

Olig2 100 Kb upstream of Olig2

So what do they do?