Computational personal genomics: selection, regulation, epigenomics, disease

Samankaltaiset tiedostot
Genomic and epigenomic signatures for interpreting complex disease

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. David R. Kelley

Functional Genomics & Proteomics

State of the Union... Functional Genomics Research Stream. Molecular Biology. Genomics. Computational Biology

Eukaryotic Comparative Genomics

11/17/11. Gene Regulation. Gene Regulation. Gene Regulation. Finding Regulatory Motifs in DNA Sequences. Regulatory Proteins

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

CS284A Representations & Algorithms for Molecular Biology. Xiaohui S. Xie University of California, Irvine

Plasmid Name: pmm290. Aliases: none known. Length: bp. Constructed by: Mike Moser/Cristina Swanson. Last updated: 17 August 2009

Gap-filling methods for CH 4 data

Viral DNA as a model for coil to globule transition

MALE ADULT FIBROBLAST LINE (82-6hTERT)

FETAL FIBROBLASTS, PASSAGE 10

7.4 Variability management

VIIKKI BIOCENTER University of Helsinki

Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1.

Predicting evolutionarily conserved regions (ECRs) in the Xenopus tropicalis genome using a MultiPipMaker-based bioinformatic strategy

Efficiency change over time

Epigeneettinen säätely ja genomin leimautuminen. Tiina Immonen BLL Biokemia ja kehitysbiologia

Experimental Identification and Computational Characterization of a Novel. Extracellular Metalloproteinase Produced by Clostridium sordellii

LIFE2000 rahoitettavat hankkeet

Collaborative & Co-Creative Design in the Semogen -projects

Methods S1. Sequences relevant to the constructed strains, Related to Figures 1-6.

Supporting Information for

7. Product-line architectures

Genome 373: Genomic Informatics. Professors Elhanan Borenstein and Jay Shendure

Valuation of Asian Quanto- Basket Options

Tieteellinen yhteistyö lääkekehityksessä merkittävistä geenivarianteista (IPHG)

Chapter 9 Motif finding. Chaochun Wei Spring 2019

Bioinformatics in Laboratory of Computer and Information Science

arxiv: v1 [stat.ml] 27 Feb 2011

TESTBED FOR NEXT GENERATION REASEARCH & INNOVATION

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Inferring Trichoderma reesei gene regulatory network

Biopankit ja Big Data terveydenhuollossa: onko open science magic bullet?

Supporting information

Manolis Kellis Piotr Indyk

Paikkatiedon semanttinen mallinnus, integrointi ja julkaiseminen Case Suomalainen ajallinen paikkaontologia SAPO

Smart specialisation for regions and international collaboration Smart Pilots Seminar

Alternative DEA Models

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ

tgg agg Supplementary Figure S1.

Elintarvikealan tutkimus, kehitys ja innovointi Oulun yliopistossa biotieteen näkökulma Hely Häggman

Telecommunication Software

Supplementary information: Biocatalysis on the surface of Escherichia coli: melanin pigmentation of the cell. exterior

Other approaches to restrict multipliers

LYTH-CONS CONSISTENCY TRANSMITTER

Statistical design. Tuomas Selander

Riitta Kilpeläinen Elia Liitiäinen Belle Selene Xia University of Eastern Finland Department of Forest Sciences Department of Economics and HECER

Capacity Utilization

Naisnäkökulma sijoittamiseen Vesa Puttonen

Use of spatial data in the new production environment and in a data warehouse

Enterprise Architecture TJTSE Yrityksen kokonaisarkkitehtuuri

BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET.

VIIKKI BIOCENTER University of Helsinki

GOOD WORK LONGER CAREER:

Uusi rekisteripohjainen diabetesluokitus kohti täsmähoitoa

A new model of regional development work in habilitation of children - Good habilitation in functional networks

Land-Use Model for the Helsinki Metropolitan Area

VAASAN YLIOPISTO Humanististen tieteiden kandidaatin tutkinto / Filosofian maisterin tutkinto

Bounds on non-surjective cellular automata

Tupakointi ja geenit. Jaakko Kaprio HY ja KTL

Co-Design Yhteissuunnittelu

Lataa Cognitive Function in Opioid Substitution Treated Patiens - Pekka Rapeli. Lataa

ELEMET- MOCASTRO. Effect of grain size on A 3 temperatures in C-Mn and low alloyed steels - Gleeble tests and predictions. Period

Biopankit ja Big Data terveydenhuollossa: onko open science magic bullet?

MUSEOT KULTTUURIPALVELUINA

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Technische Daten Technical data Tekniset tiedot Hawker perfect plus

LANSEERAUS LÄHESTYY AIKATAULU OMINAISUUDET. Sähköinen jäsenkortti. Yksinkertainen tapa lähettää viestejä jäsenille

Genomin ilmentyminen

Uusi Ajatus Löytyy Luonnosta 4 (käsikirja) (Finnish Edition)

IAA Regulation Committee

Toimintamallit happamuuden ennakoimiseksi ja riskien hallitsemiseksi turvetuotantoalueilla (Sulfa II)

Innovation Platform Thinking Jukka P. Saarinen Mika M. Raunio Nadja Nordling Taina Ketola Anniina Heinikangas Petri Räsänen

SFS/SR315 Tekoäly Tekoälyn standardisointi


National Building Code of Finland, Part D1, Building Water Supply and Sewerage Systems, Regulations and guidelines 2007

BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT

Searching (Sub-)Strings. Ulf Leser

ELINPATOLOGIAN RYHMÄOPETUS MUNUAINEN

Ostamisen muutos muutti myynnin. Technopolis Business Breakfast

Särmäystyökalut kuvasto Press brake tools catalogue

Digitalisaation ja IT:n johtamisen vaatimat kyvykkyydet ja osaamisen kehittäminen

GLP-vaatimukset ATMP-valmisteiden turvallisuustutkimuksille

LD Score regression for estimating and partitioning heritability of lipid levels in the Finnish population

ProAgria. Opportunities For Success

Somaattinen sairaus nuoruudessa ja mielenterveyden häiriön puhkeamisen riski

Infrastruktuurin asemoituminen kansalliseen ja kansainväliseen kenttään Outi Ala-Honkola Tiedeasiantuntija

Mitä Piilaaksossa & globaalisti tapahtuu ja mitä Tekes voi tarjota yrityksille

Benchmarking Controlled Trial - a novel concept covering all observational effectiveness studies

Liikunnan vaikuttavuus ja kuntoutus

WAMS 2010,Ylivieska Monitoring service of energy efficiency in housing Jan Nyman,

Encapsulation. Imperative programming abstraction via subprograms Modular programming data abstraction. TTY Ohjelmistotekniikka

RANTALA SARI: Sairaanhoitajan eettisten ohjeiden tunnettavuus ja niiden käyttö hoitotyön tukena sisätautien vuodeosastolla

TIEKE Verkottaja Service Tools for electronic data interchange utilizers. Heikki Laaksamo

Elixir of life Elixir for Mind and Body

Genomin ilmentyminen Liisa Kauppi, Genomibiologian tutkimusohjelma

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Kuolevaisuuden & Kuolemattomuuden Biologia

Transkriptio:

Computational personal genomics: selection, regulation, epigenomics, disease Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

TATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA ATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC GCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA CTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT GATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA CGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG ACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA GGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT CAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA AGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC AGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG TCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG CTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA ATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG GAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC TGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA TTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT GCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA TTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG CCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG TTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC AACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA ATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT GCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA TATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA CTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA AGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC ACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGT

TATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA ATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC GCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA CTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT GATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA CGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG ACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA Genes GGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT Regulatory motifs CAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA Encode AGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC Control AGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG proteins TCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG gene expression CTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA ATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG GAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC TGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA TTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT GCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA TTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG CCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG TTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC AACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA ATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT GCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA TATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA CTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA AGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC ACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGT

Family Inheritance Personal genomics today: 23 and We Me vs. my brother Recombination breakpoints Dad s mom My dad Mom s dad Disease risk Human ancestry Genomics: Regions mechanisms drugs Systems: genes combinations pathways

Understanding human variation and human disease Gene annotation (Coding, 5 /3 UTR, RNAs) Evolutionary signatures Non-coding annotation Chromatin signatures CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/ ) Roles in gene/chromatin regulation Activator/repressor signatures Other evidence of function Signatures of selection (sp/pop) Challenge: from loci to mechanism, pathways, drug targets Goal: A systems-level understanding of genomes and gene regulation: The regulators: Transcription factors, micrornas, sequence specificities The regions: enhancers, promoters, and their tissue-specificity The targets: TFs targets, regulators enhancers, enhancers genes The grammars: Interplay of multiple TFs prediction of gene expression The parts list = Building blocks of gene regulatory networks

Tools for interpreting the human genome Evolutionary signatures Genome annotation Distinct signatures for proteins, ncrnas, mirnas, motifs Read-through, excess-constraint, networks,human selection Chromatin signatures Dynamic regulatory regions Define chromatin states from combinations of histone marks Distinct classes of promoter/enhancer/transcribed/repressed Activity signatures Link enhancer networks Activity-based linking of regulators enhancers targets Testing of 1000s of enhancer activator / repressor motifs Personal genomics Interpret disease mechanism Disease-associations: Mechanistic predictions for variants Beyond top hits: 2000+ GWAS T1D-associated enhancers Global methylation changes in Alzheimer s: NRSF, ELK1, CTCF

Evolutionary signatures reveal genes, RNAs, motifs Compare 29 mammals Increased conservation pinpoints functional regions Lindblad-Toh Nature 2011 Stark Nature 2007 Distinct patterns of change distinguish diff. functions Protein-coding genes - Codon Substitution Frequencies - Reading Frame Conservation RNA structures - Compensatory changes - Silent G-U substitutions micrornas - Shape of conservation profile - Structural features: loops, pairs - Relationship with 3 UTR motifs Regulatory motifs - Mutations preserve consensus - Increased Branch Length Score - Genome-wide conservation

Compare 29 mammals: Reveal constrained positions NRSF motif Reveal individual transcription factor binding sites Within motif instances reveal position-specific bias More species: motif consensus directly revealed

Human constraint outside conserved regions Active regions Average diversity (heterozygosity) Aggregate over the genome Ward and Kellis Science 2012 Non-conserved regions: ENCODE-active regions show reduced diversity Lineage-specific constraint in biochemically-active regions Conserved regions: Non-ENCODE regions show increased diversity Loss of constraint in human when biochemically-inactive

Human-specific enhancer functions play regulatory roles Transcription initiation from Pol2 promoter Transcription coactivator activity Transcription factor binding Chromatin binding Negative regulation of transcription, DNA-dependent Transcription factor complex Protein complex Protein kinase activity Nerve growth factor receptor signaling pathway Signal transducer activity Protein serine/threonine kinase activity Negative regulation of transcription from Pol2 prom Protein tyrosine kinase activity In utero embryonic development Regulatory genes: Transcription, Chromatin, Signaling. Developmental enhancers: embryo, nerve growth

Tools for interpreting the human genome Evolutionary signatures Genome annotation Distinct signatures for proteins, ncrnas, mirnas, motifs Read-through, excess-constraint, networks,human selection Chromatin signatures Dynamic regulatory regions Define chromatin states from combinations of histone marks Distinct classes of promoter/enhancer/transcribed/repressed Activity signatures Link enhancer networks Activity-based linking of regulators enhancers targets Testing of 1000s of enhancer activator / repressor motifs Personal genomics Interpret disease mechanism Disease-associations: Mechanistic predictions for variants Beyond top hits: 2000+ GWAS T1D-associated enhancers Global methylation changes in Alzheimer s: NRSF, ELK1, CTCF

Integrate epigenomics datasets in multiple cell types 2. DNA methylation 3. DNA accessibility Epigenomic maps 1. Histone modifications Epigenetic modifications DNA/histone/nucleosome Encode epigenetic state Histone code hypothesis Distinct function for distinct combinations of marks? Hundreds of histone marks Astronomical number of histone mark combinations How do we find biologically relevant ones? Unsupervised approach Probabilistic model Explicit combinatorics

Chromatin state dynamics across nine cell types Predicted linking Correlated activity Single annotation track for each cell type Summarize cell-type activity at a glance Can study 9-cell activity pattern across

Epigenomics Roadmap: 90 complete epigenomes Interpret GWAS, global effects, reveal relevant cell types

Enhancer modules associated with tissue identity Clustering of 500,000 distal enhancers Tissue-specific disease-relevant tissues and processes

Dissect motifs in 10,000s of human enhancers 54000+ measurements (x2 cells, 2x repl) Predict activators/repressors based on activity correlations Validate by engineering enhancers disrupting causal motifs

Example activator: conserved HNF4 motif match WT expression specific to HepG2 Motif match disruptions reduce expression to background Non-disruptive changes maintain expression Random changes depend on effect to motif match

Tools for interpreting the human genome Evolutionary signatures Genome annotation Distinct signatures for proteins, ncrnas, mirnas, motifs Read-through, excess-constraint, networks,human selection Chromatin signatures Dynamic regulatory regions Define chromatin states from combinations of histone marks Distinct classes of promoter/enhancer/transcribed/repressed Activity signatures Link enhancer networks Activity-based linking of regulators enhancers targets Testing of 1000s of enhancer activator / repressor motifs Personal genomics Interpret disease mechanism Disease-associations: Mechanistic predictions for variants Beyond top hits: 2000+ GWAS T1D-associated enhancers Global methylation changes in Alzheimer s: NRSF, ELK1, CTCF

Revisiting diseaseassociated variants xx Disease-associated SNPs enriched for enhancers in relevant cell types E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator

Variation in methylation patterns largely genotype driven Global signature of repression in 1000s regulatory regions: hypermethylation, enhancer states, brain regulator targets G E D: Global repression of brain enhancers in AD MAP Memory and Aging Project + ROS Religious Order Study Dorsolateral PFC Genotype (1M SNPs x700 ind.) Reference Chromatin states Methylation (450k probes x 700 ind)

Global hyper-methylation in 1000s of AD-associated loci P-value Top 7000 probes 480,000 probes, ranked by Alzheimer s association Methylation Alzheimer s-associated probes are hypermethylated Global effect across 1000s of probes Rank all probes by Alzheimer s association 7000 probes increase methylation (repressed) Enriched in brain-specific enhancers Near motifs of brain-specific regulators Complex disease: genome-wide effects

Covers computational challenges associated with personal genomics: - genotype phasing and haplotype reconstruction resolve mom/dad chromosomes - exploiting linkage for variant imputation co-inheritance patterns in human population - ancestry painting for admixed genomes result of human migration patterns - predicting likely causal variants using functional genomics from regions to mechanism - comparative genomics annotation of coding/non-coding elements gene regulation - relating regulatory variation to gene expression or chromatin quantitative trait loci - measuring recent evolution and human selection selective pressure shaped our genome - using systems/network information to decipher weak contributions combinatorics - challenge of complex multi-genic traits: height, diabetes, Alzheimer's 1000s of genes

MIT Computational Biology group Compbio.mit.edu Mike Lin Matt Eaton Stata3 Stata4 Angela Yen Manolis Kellis Ben Holmes Soheil Feizi Jason Ernst Luke Ward Jessica Wu Bob Altshuler Louisa DiStefano Irwin Jungreis Mukul Bansal Daniel Marbach Sushmita Roy Chris Bristow Dave Hendrix Pouya Kheradpour Rachel Sealfon Loyal Goff Stefan Washietl