Genomic and epigenomic signatures for interpreting complex disease

Samankaltaiset tiedostot
Computational personal genomics: selection, regulation, epigenomics, disease

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. David R. Kelley

State of the Union... Functional Genomics Research Stream. Molecular Biology. Genomics. Computational Biology

Functional Genomics & Proteomics

Eukaryotic Comparative Genomics

MALE ADULT FIBROBLAST LINE (82-6hTERT)

FETAL FIBROBLASTS, PASSAGE 10

7.4 Variability management

Genome 373: Genomic Informatics. Professors Elhanan Borenstein and Jay Shendure

11/17/11. Gene Regulation. Gene Regulation. Gene Regulation. Finding Regulatory Motifs in DNA Sequences. Regulatory Proteins

Plasmid Name: pmm290. Aliases: none known. Length: bp. Constructed by: Mike Moser/Cristina Swanson. Last updated: 17 August 2009

Experimental Identification and Computational Characterization of a Novel. Extracellular Metalloproteinase Produced by Clostridium sordellii

tgg agg Supplementary Figure S1.

Capacity Utilization

Efficiency change over time

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

CS284A Representations & Algorithms for Molecular Biology. Xiaohui S. Xie University of California, Irvine

Bioinformatics in Laboratory of Computer and Information Science

Gap-filling methods for CH 4 data

Use of spatial data in the new production environment and in a data warehouse

Alternative DEA Models

Collaborative & Co-Creative Design in the Semogen -projects

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

On instrument costs in decentralized macroeconomic decision making (Helsingin Kauppakorkeakoulun julkaisuja ; D-31)

Supplementary information: Biocatalysis on the surface of Escherichia coli: melanin pigmentation of the cell. exterior

BLOCKCHAINS AND ODR: SMART CONTRACTS AS AN ALTERNATIVE TO ENFORCEMENT

Master's Programme in Life Science Technologies (LifeTech) Prof. Juho Rousu Director of the Life Science Technologies programme 3.1.

GOOD WORK LONGER CAREER:

Methods S1. Sequences relevant to the constructed strains, Related to Figures 1-6.

MUSEOT KULTTUURIPALVELUINA

7. Product-line architectures

TESTBED FOR NEXT GENERATION REASEARCH & INNOVATION

Other approaches to restrict multipliers

Lataa Cognitive Function in Opioid Substitution Treated Patiens - Pekka Rapeli. Lataa

Results on the new polydrug use questions in the Finnish TDI data

The CCR Model and Production Correspondence

Valuation of Asian Quanto- Basket Options

Predicting evolutionarily conserved regions (ECRs) in the Xenopus tropicalis genome using a MultiPipMaker-based bioinformatic strategy

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

VIIKKI BIOCENTER University of Helsinki

Increase of opioid use in Finland when is there enough key indicator data to state a trend?

Returns to Scale II. S ysteemianalyysin. Laboratorio. Esitelmä 8 Timo Salminen. Teknillinen korkeakoulu

Guideline on Similar biological medicinal products containing biotechnology-derived proteins as active substance: non-clinical and clinical issues

Bounds on non-surjective cellular automata

The BaltCICA Project Climate Change: Impacts, Costs and Adaptation in the Baltic Sea Region

Tupakointi ja geenit. Jaakko Kaprio HY ja KTL

AYYE 9/ HOUSING POLICY

BDD (behavior-driven development) suunnittelumenetelmän käyttö open source projektissa, case: SpecFlow/.NET.

LIFE2000 rahoitettavat hankkeet

Epigeneettinen säätely ja genomin leimautuminen. Tiina Immonen BLL Biokemia ja kehitysbiologia

Supporting Information for

Infrastruktuurin asemoituminen kansalliseen ja kansainväliseen kenttään Outi Ala-Honkola Tiedeasiantuntija

TM ETRS-TM35FIN-ETRS89 WTG

Metsälamminkankaan tuulivoimapuiston osayleiskaava

DIGITAL MARKETING LANDSCAPE. Maatalous-metsätieteellinen tiedekunta

Tarua vai totta: sähkön vähittäismarkkina ei toimi? Satu Viljainen Professori, sähkömarkkinat

Chapter 9 Motif finding. Chaochun Wei Spring 2019

Inferring Trichoderma reesei gene regulatory network

arxiv: v1 [stat.ml] 27 Feb 2011

enzymatic families regulated in response to a short-term cytokinin treatment in root apices

Tieteellinen yhteistyö lääkekehityksessä merkittävistä geenivarianteista (IPHG)

Statistical design. Tuomas Selander

Liikunnan vaikuttavuus ja kuntoutus

ELINPATOLOGIAN RYHMÄOPETUS MUNUAINEN

Constructive Alignment in Specialisation Studies in Industrial Pharmacy in Finland

Ikääntymistutkimuksen yhteiseurooppalaiset hankkeet ja aloitteet


Kaivostoiminnan eri vaiheiden kumulatiivisten vaikutusten huomioimisen kehittäminen suomalaisessa luonnonsuojelulainsäädännössä

Supporting information

TM ETRS-TM35FIN-ETRS89 WTG

C++11 seminaari, kevät Johannes Koskinen

Naisnäkökulma sijoittamiseen Vesa Puttonen

Ostamisen muutos muutti myynnin. Technopolis Business Breakfast

Biopankit ja Big Data terveydenhuollossa: onko open science magic bullet?

21~--~--~r--1~~--~--~~r--1~

LYTH-CONS CONSISTENCY TRANSMITTER

TM ETRS-TM35FIN-ETRS89 WTG

Large Scale Sequence Analysis with Applications to Genomics

Land-Use Model for the Helsinki Metropolitan Area

1. SIT. The handler and dog stop with the dog sitting at heel. When the dog is sitting, the handler cues the dog to heel forward.

ELEMET- MOCASTRO. Effect of grain size on A 3 temperatures in C-Mn and low alloyed steels - Gleeble tests and predictions. Period

Viral DNA as a model for coil to globule transition

Smart specialisation for regions and international collaboration Smart Pilots Seminar

KONEISTUSKOKOONPANON TEKEMINEN NX10-YMPÄRISTÖSSÄ

TM ETRS-TM35FIN-ETRS89 WTG

Co-Design Yhteissuunnittelu

WindPRO version joulu 2012 Printed/Page :47 / 1. SHADOW - Main Result

( ( OX2 Perkkiö. Rakennuskanta. Varjostus. 9 x N131 x HH145

ReFuel 70 % Emission Reduction Using Renewable High Cetane Number Paraffinic Diesel Fuel. Kalle Lehto, Aalto-yliopisto 5.5.

Tynnyrivaara, OX2 Tuulivoimahanke. ( Layout 9 x N131 x HH145. Rakennukset Asuinrakennus Lomarakennus 9 x N131 x HH145 Varjostus 1 h/a 8 h/a 20 h/a

RANTALA SARI: Sairaanhoitajan eettisten ohjeiden tunnettavuus ja niiden käyttö hoitotyön tukena sisätautien vuodeosastolla

TM ETRS-TM35FIN-ETRS89 WTG

,0 Yes ,0 120, ,8

WindPRO version joulu 2012 Printed/Page :42 / 1. SHADOW - Main Result

Technische Daten Technical data Tekniset tiedot Hawker perfect plus

WAMS 2010,Ylivieska Monitoring service of energy efficiency in housing Jan Nyman,

rapid evolution in copy number, location and sequence, with diverse turnover mechanisms.

3 9-VUOTIAIDEN LASTEN SUORIUTUMINEN BOSTONIN NIMENTÄTESTISTÄ

Information on preparing Presentation

TM ETRS-TM35FIN-ETRS89 WTG

Transkriptio:

Genomic and epigenomic signatures for interpreting complex disease Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

TATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA ATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC GCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA CTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT GATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA CGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG ACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA GGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT CAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA AGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC AGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG TCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG CTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA ATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG GAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC TGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA TTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT GCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA TTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG CCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG TTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC AACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA ATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT GCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA TATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA CTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA AGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC ACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGT

TATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA ATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC GCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA CTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT GATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA CGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG ACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA Genes GGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT Regulatory motifs CAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA Encode AGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC Control AGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG proteins TCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG gene expression CTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA ATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG GAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC TGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA TTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT GCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA TTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG CCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG TTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC AACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA ATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT GCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA TATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA CTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA AGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC ACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGT

Building systems-level views of genomes and disease Goal: A systems-level understanding of genomes and gene regulation: The regulators: Transcription factors, micrornas, sequence specificities The regions: enhancers, promoters, and their tissue-specificity The targets: TFs targets, regulators enhancers, enhancers genes The grammars: Interplay of multiple TFs prediction of gene expression The parts list = Building blocks of gene regulatory networks Our tools: Comparative genomics & large-scale experimental datasets. Evolutionary signatures for coding/non-coding genes, micrornas, motifs Chromatin signatures for regulatory regions and their tissue specificity Activity signatures for linking regulators enhancers target genes Predictive models for gene function, gene expression, chromatin state Integrative models = Define roles in development, health, disease

Challenge: interpreting disease-associated variants Gene annotation (Coding, 5 /3 UTR, RNAs) Evolutionary signatures Non-coding annotation Chromatin signatures CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/ ) Roles in gene/chromatin regulation Activator/repressor signatures Other evidence of function Signatures of selection (sp/pop) GWAS, case-control, reveal disease-associated variants Molecular mechanism, cell-type specificity, drug targets Challenges towards interpreting disease variants Find true causative SNP among many candidates in LD Use causal variant: predict function, pathway, drug targets Non-coding variant: type of function, cell type of activity Regulatory variant: upstream regulators, downstream targets This talk: genomics tools for addressing these challenges

Goal: Towards personal systems genomics Family Inheritance Me vs. my brother Recombination breakpoints Dad s mom My dad Mom s dad Disease risk Human ancestry Genomics: Regions mechanisms drugs Systems: genes combinations pathways

Systems-level views of disease epigenomics Evolutionary signatures gene/genome annotation High-resolution annotation: genes, RNAs, motif instances Measuring selection within the human population Chromatin states for interpreting disease association Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators enhancers targets Interpreting disease-associated sequence variants Mechanistic predictions for individual top-scoring SNPs Functional roles of 1000s of disease-associated SNPs Systematic manipulation of 2000+ human enhancers Test effect of single-motif and single-nucleotide disruptions Role of activator/repressor motifs, disease-associated SNPs Personal genomes/epigenomes in health and disease Allele-specific activity.alzheimer s brain methylation SNP Global repression of distal enhancers. NRSF, ELK1, CTCF

Large-scale comparative genomics datasets 29 mammals 12 flies 17 fungi Post-duplication 9 Yeasts 8 Candida Kellis Nature 2003 Nature 2004; Stark Nature 2007; Clark Nature 2007; Butler Nature 2009; Lindblad-Toh Nature 2011 P P N P P N P P Haploid Diploid Pre-dup

Comparative genomics and evolutionary signatures Comparative genomics can reveal functional elements For example: exons are deeply conserved to mouse, chicken, fish Many other elements are also strongly conserved: exons / regulatory? Can we also pinpoint specific functions of each region? Yes! Patterns of change distinguish different types of functional elements Specific function Selective pressures Patterns of mutation/inse/del Develop evolutionary signatures characteristic of each function Kellis Nature 2003 Nature 2004; Stark Nature 2007; Clark Nature 2007; Butler Nature 2009; Lindblad-Toh Nature 2011

Evolutionary signatures for diverse functions Protein-coding genes - Codon Substitution Frequencies - Reading Frame Conservation RNA structures - Compensatory changes - Silent G-U substitutions micrornas - Shape of conservation profile - Structural features: loops, pairs - Relationship with 3 UTR motifs Regulatory motifs - Mutations preserve consensus - Increased Branch Length Score - Genome-wide conservation Stark et al, Nature 2007

Implications for genome annotation / regulation Novel protein-coding genes Revised gene annotations Unusual gene structures Novel structural families Targeting, editing, stability Riboswitches in mammals Novel/expanded mir families mir/mir* arm cooperation Sense/anti-sense mir switches Novel regulatory motifs Regulatory motif instances TF/miRNA regulatory networks Single binding site resolution Stark et al, Nature 2007

Translational read-through in human & fly Overlapping selection in human exons Protein-coding conservation Jungreis, Genome Research 2011 Stop codon read through Continued proteincoding conservation 2 nd stop codon No more conserv Synonym. Substitut. Rate Lin, Genome Research 2011 Reveal splicing signals, RNA structures, enhancer motifs, dual-coding genes RNA structure families: ortholog/paralog cons Regions of codon-level positive selection distributed localized Parker Gen. Res. 2011 Ex:MAT2A S-adeosyl-methionic level detection Structure/loop sequence deep conservation Lindblad-Toh Nature 2011 Distributed vs. localized positive selection Immunity/taste vs. retinal/bone/secretion

NRSF motif Measuring constraint at individual nucleotides Reveal individual transcription factor binding sites Within motif instances reveal position-specific bias More species: motif consensus directly revealed

Detect SNPs that disrupt conserved regulatory motifs Functionally-associated SNPs enriched in states, constraint Prioritize candidates, increase resolution, disrupted motifs

Measuring selection within the human lineage

Human constraint outside conserved regions Active regions Average diversity (heterozygosity) Aggregate over the genome Non-conserved regions: ENCODE-active regions show reduced diversity Lineage-specific constraint in biochemically-active regions Conserved regions: Non-ENCODE regions show increased diversity Loss of constraint in human when biochemically-inactive

Strongest: motifs, short RNA, Dnase, ChIP, lncrna Significant derived allele depletion in active features

Bound motifs show increased human constraint Position-specific reduction in bound motif heterozygosity Aggregate across thousands of CTCF motif instances

Most constrained human-specific enhancer functions Transcription initiation from Pol2 promoter Transcription coactivator activity Transcription factor binding Chromatin binding Negative regulation of transcription, DNA-dependent Transcription factor complex Protein complex Protein kinase activity Nerve growth factor receptor signaling pathway Signal transducer activity Protein serine/threonine kinase activity Negative regulation of transcription from Pol2 prom Protein tyrosine kinase activity In utero embryonic development Regulatory genes: Transcription, Chromatin, Signaling. Developmental enhancers: embryo, nerve growth

Systems-level views of disease epigenomics Evolutionary signatures gene/genome annotation High-resolution annotation: genes, RNAs, motif instances Measuring selection within the human population Chromatin states for interpreting disease association Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators enhancers targets Interpreting disease-associated sequence variants Mechanistic predictions for individual top-scoring SNPs Functional roles of 1000s of disease-associated SNPs Systematic manipulation of 2000+ human enhancers Test effect of single-motif and single-nucleotide disruptions Role of activator/repressor motifs, disease-associated SNPs Personal genomes/epigenomes in health and disease Allele-specific activity.alzheimer s brain methylation SNP Global repression of distal enhancers. NRSF, ELK1, CTCF

Chromatin signatures for genome annotation Epigenomic maps 1. DNA methylation 2. Histone modifications Ernst et al Nature Biotech 2010 3. DNA accessibility See also: Amos Tanay, Bill Noble.

ENCODE: Study nine marks in nine human cell lines 9 marks H3K4me1 H3K4me2 H3K4me3 H3K27ac H3K9ac H3K27me3 H4K20me1 H3K36me3 CTCF +WCE +RNA x 9 human cell types HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM Umbilical vein endothelial Keratinocytes Lymphoblastoid Myelogenous leukemia Liver carcinoma Normal human lung fibroblast Mammary epithelial cell Skeletal muscle myoblasts H1 Embryonic Brad Bernstein ENCODE Chromatin Group 81 Chromatin Mark Tracks (2 81 combinations) Learned jointly across cell types (virtual concatenation) State definitions are common State locations are dynamic Ernst et al, Nature 2011

Chromatin states dynamics across nine cell types Predicted linking Correlated activity Single annotation track for each cell type Summarize cell-type activity at a glance Can study 9-cell activity pattern across

Introducing multi-cell activity profiles Gene Chromatin Active TF motif TF regulator Dip-aligned expression States enrichment expression motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 Link enhancers to target genes ON OFF Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile

Enhancer-gene links supported by eqtl-gene links eqtl study Individuals Indiv. 1 Indiv. 2 Indiv. 3 Indiv. 4 Indiv. 5 Indiv. 6 Indiv. 7 Indiv. 8 Indiv. 9-0.5-1.5-1.8 3.1 1.1-1.8-1.4 3.2 4.4 15kb A A A C A A A C C Validation rationale: Expression Quantitative Trait Loci (eqtls) provide independent SNP-to-gene links Do they agree with activity-based links? Example: Lymphoblastoid (GM) cells study Expression/genotype across 60 individuals (Montgomery et al, Nature 2010) 120 eqtls are eligible for enhancer-gene linking based on our datasets 51 actually linked (43%) using predictions 4-fold enrichment (10% exp. by chance) Expression level of gene Sequence variant at distal position Independent validation of links. Relevance to disease datasets. 25

Visualizing 10,000s predicted enhancer-gene links Overlapping regulatory units, both few and many Both upstream and downstream elements linked Enhancers correlate with sequence constraint 26

Introducing multi-cell activity profiles Gene Chromatin Active TF motif TF regulator Dip-aligned expression States enrichment expression motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 Link TFs to target enhancers Predict activators vs. repressors ON OFF Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile

Coordinated activity reveals activators/repressors Enhancer activity Activity signatures for each TF Ex1: Oct4 predicted activator of embryonic stem (ES) cells Ex2: Gfi1 repressor of K562/GM cells Enhancer networks: Regulator enhancer target gene

Causal motifs supported by dips & enhancer assays Predicted causal HNF motifs (that also showed dips) in HepG2 enhancers Dip evidence of TF binding (nucleosome displacement) Tarjei Mikkelsen Enhancer activity halved by single-motif disruption Motifs bound by TF, contribute to enhancers 29

Systems-level views of disease epigenomics Evolutionary signatures gene/genome annotation High-resolution annotation: genes, RNAs, motif instances Measuring selection within the human population Chromatin states for interpreting disease association Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators enhancers targets Interpreting disease-associated sequence variants Mechanistic predictions for individual top-scoring SNPs Functional roles of 1000s of disease-associated SNPs Systematic manipulation of 2000+ human enhancers Test effect of single-motif and single-nucleotide disruptions Role of activator/repressor motifs, disease-associated SNPs Personal genomes/epigenomes in health and disease Allele-specific activity.alzheimer s brain methylation SNP Global repression of distal enhancers. NRSF, ELK1, CTCF

Interpreting disease-association signals Interpret variants using Epigenomics - Chromatin states: Enhancers, promoters, motifs - Enrichment in individual loci, across 1000s of SNPs in T1D CATGACTG CATGCCTG Genotype GWAS Disease Epigenome changes in disease

Revisiting diseaseassociated variants xx Disease-associated SNPs enriched for enhancers in relevant cell types E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator

Disrupt activator Ets-1 motif Loss of GM-specific activation Loss of enhancer function Loss of HLA-DRB1 expression Creation of repressor Gfi1 motif Gain K562-specific repression Loss of enhancer function Loss of CCDC162 expression Mechanistic predictions for top disease-associated SNPs Lupus erythromatosus in GM lymphoblastoid Erythrocyte phenotypes in K562 leukemia cells `

Allele-specific chromatin marks: cis-vs-trans effects Maternal and paternal GM12878 genomes sequenced Map reads to phased genome, handle SNPs indels Correlate activity changes with sequence differences

HaploReg: systematic ENCODE mining of variants (compbio.mit.edu/haploreg) Start with any list of SNPs or select a GWA study Mine publically available ENCODE data for significant hits Hundreds of assays, dozens of cells, conservation, motifs Report significant overlaps and link to info/browser

Functional enrichment for 1000s of SNPs

Full T1D association spectrum 1000s of causal SNPs Rank all SNPs by P-value Find chromatin states with enrichment in high ranks Signal spans 1000s of SNPs GM12878 K562 Lymphoblastoid Myelogenous leukemia GM12878 enhancer enrichment now seen Could bias in array design contribute to these enrichments? Evaluate all 1000 genomes SNPs by imputing those in LD Cell type specific: GM and K562 enhancers Chromatin state specific: Enhancers/promoters

Imputing SNPs in LD stronger cell/state separation Enhancers across cell types Chromatin states in GM12878 Promoters: 462 (excess 81) Enhancers: 2049 (excess 392) 1940 distinct loci (R^2<.8) Transcribed: 4740 (excess 522) Insulator: 240 (excess 23) Repressed: 1351 (excess 76) Other: 21k (deplete 1093) Excess of 30,000 SNPs 2049 enhancers (excess 392) Mostly found in independent loci (1730 with R 2 <0.2) Systematically measure their regulatory contributions

Systems-level views of disease epigenomics Evolutionary signatures gene/genome annotation High-resolution annotation: genes, RNAs, motif instances Measuring selection within the human population Chromatin states for interpreting disease association Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators enhancers targets Interpreting disease-associated sequence variants Mechanistic predictions for individual top-scoring SNPs Functional roles of 1000s of disease-associated SNPs Systematic manipulation of 2000+ human enhancers Test effect of single-motif and single-nucleotide disruptions Role of activator/repressor motifs, disease-associated SNPs Personal genomes/epigenomes in health and disease Allele-specific activity.alzheimer s brain methylation SNP Global repression of distal enhancers. NRSF, ELK1, CTCF

High-throughput experiments: 10,000s enhancers Melnikov, Nature Biotech 2012 Experiment features: Multiplexed enhancer assays 10,000s of elements Each w/ unique barcode Multiple human cell types Repeat experiments on same array / diff barcodes Applied to: Test enhancer offsets Test causal motifs With: Tarjei Mikkelse Broad Institute, ARRA funds See also: Barak Cohen, Jay Shendure, Eran Segal

Systematic motif disruption for 5 activators and 2 repressors in 2 human cell lines 54000+ measurements (x2 cells, 2x repl)

Example activator: conserved HNF4 motif match WT expression specific to HepG2 Motif match disruptions reduce expression to background Non-disruptive changes maintain expression Random changes depend on effect to motif match

Results hold across 2000+ enhancers Scramble abolishes reporter expression Neutral mutations show no change Increasing mutations show more expression However, only 40% show wild-type expression: context?

Features of functional wildtype enhancers Nucleosome exclusion, motif conservation, other TFs Each of these features is encoded in primary sequence

Repressors of HepG2 enhancer act in K562 Repressor disruption aberrant expression in opposite cell types

Testing effect of SNP change in enhancer constructs SNPs in enhancer regions can lead to expression changes in downstream reporter genes Currently testing all T1D-associated enhancer SNPs

Systems-level views of disease epigenomics Evolutionary signatures gene/genome annotation High-resolution annotation: genes, RNAs, motif instances Measuring selection within the human population Chromatin states for interpreting disease association Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators enhancers targets Interpreting disease-associated sequence variants Mechanistic predictions for individual top-scoring SNPs Functional roles of 1000s of disease-associated SNPs Systematic manipulation of 2000+ human enhancers Test effect of single-motif and single-nucleotide disruptions Role of activator/repressor motifs, disease-associated SNPs Personal genomes/epigenomes in health and disease Allele-specific activity.alzheimer s brain methylation SNP Global repression of distal enhancers. NRSF, ELK1, CTCF

Interpreting disease-association signals (1) Interpret variants using Epigenomics - Chromatin states: Enhancers, promoters, motifs - Enrichment in individual loci, across 1000s of SNPs in T1D CATGACTG CATGCCTG Genotype GWAS Disease mqtls MWAS Epigenome (2) Epigenome changes in disease - Intermediate molecular phenotypes associated with disease - Variation in brain methylomes of Alzheimer s patients

Phil de Jager: Methylation in 750 Alzheimer patients 750 individuals 500,000 methylation probes Phil de Jager, Roadmap disease epigenomics Genome Epigenome Phenotype Brad Bernstein REMC mapping meqtl Epigenome Classification MWAS Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mqtls, regulation 1 2

P-value exponent (-log 10 P) 2,500 mqtls for neighboring SNPs at 10-14 Chromosome and genomic position Distance from CpG (MB) -1 1 Overlay Manhattan plots of 450,000 methylation probes Cutoff of 10-14 (10-8 after Bonferroni correction) Use to pinpoint disrupted motifs, predict epigenome 50

Focusing on 2831 most variable probes Probe intensity distribution Inter-individual variability Promoters: Stable low (active) Gene bodies: Stable high (active) Enhancers/poised: Most variable Hemi-methylated probes are also the most variable Tiny fraction (0.6%) of all probes

Multimodal SNP-associated Promoter-depleted Multimodal probes (~3Κ) 184 2,647 SNP-associated probes (29% of all) 138,731 All probes SNP-associated 1 Active promoter 2 Promoter flanking 3 Active enhancer 4 Weak enhancer 5 Gene bodies 93.5% of multimodal probes are SNP-associated Importance of distinguishing contribution of genotype to disease associations % of CpG probes 6 Active gene bodies 7 Repetitive 8 Heterochromatin 9 Low signal SNP-associated probes depleted in promoters (driven epigenetically>genetically, open chrom)

Phil de Jager: Methylation in 750 Alzheimer patients 750 individuals 500,000 methylation probes Phil de Jager, Roadmap disease epigenomics Genome Epigenome Phenotype Brad Bernstein REMC mapping meqtl Epigenome Classification MWAS Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mqtls, regulation 1 2

Global hyper-methylation trend in AD-associated probes P-value Top 7000 probes Hypomethylated probes (active) Alzheimer s Normal 480,000 probes, ranked by Alzheimer s association Methylation Alzheimer s-associated probes are hypermethylated Global effect across 1000s of probes Rank all probes by Alzheimer s association Observe functional changes down ranklist 7000 probes show shift in methylation Hypermethylated probes (repressed) Alzheimer s Normal Complex disease: genome-wide effects, 1000s of loci

Chromatin state breakdown reveals activity Red: More methylated in Alhzeimer s Blue: Less methylated in Alzheimer s % probes Significant probes are in enhancers Not promoters 1 Active promoter 2 Promoter flanking 3 Active enhancer 4 Weak enhancer 5 Gene bodies 6 Active gene bodies 7 Repetitive 8 Heterochromatin 9 Low signal * => fisher exact test, p-value <= 0.001

Alzheimer s prediction vs. likely biological pathways NRSF All probes, ranked by AD assoc. P-value ELK1 All probes, ranked by AD assoc. P-value CTCF Predictive power: 6k probes + APOE Regulatory motifs associated with Alzheimer-associated probes suggest potential pathways We have not solved Alzheimer s, but new insights gained

Systems-level views of disease epigenomics Evolutionary signatures gene/genome annotation High-resolution annotation: genes, RNAs, motif instances Measuring selection within the human population Chromatin states for interpreting disease association Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators enhancers targets Interpreting disease-associated sequence variants Mechanistic predictions for individual top-scoring SNPs Functional roles of 1000s of disease-associated SNPs Systematic manipulation of 2000+ human enhancers Test effect of single-motif and single-nucleotide disruptions Role of activator/repressor motifs, disease-associated SNPs Personal genomes/epigenomes in health and disease Allele-specific activity.alzheimer s brain methylation SNP Global repression of distal enhancers. NRSF, ELK1, CTCF

Understanding human variation and human disease Gene annotation (Coding, 5 /3 UTR, RNAs) Evolutionary signatures Non-coding annotation Chromatin signatures CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/ ) Roles in gene/chromatin regulation Activator/repressor signatures Other evidence of function Signatures of selection (sp/pop) Challenge: from loci to mechanism, pathways, drug targets Goal: A systems-level understanding of genomes and gene regulation: The regulators: Transcription factors, micrornas, sequence specificities The regions: enhancers, promoters, and their tissue-specificity The targets: TFs targets, regulators enhancers, enhancers genes The grammars: Interplay of multiple TFs prediction of gene expression The parts list = Building blocks of gene regulatory networks

Collaborators and Acknowledgements ENCODE Brad Bernstein, Tarjei Mikkelsen, Noam Shoresh, David Epstein Massively parallel enhancer reporter assays Tarjei Mikkelsen, Broad Institute Epigenome Roadmap Bing Ren, Brad Bernstein, John Stam, Joe Costello 2X mammals Kerstin Lindblad-Toh, Eric Lander, Manuel Garber, Or Zuk Funding NHGRI, NIH, NSF Sloan Foundation

Daniel Marbach Mike Lin Jason Ernst Jessica Wu Rachel Sealfon Pouya Kheradpour (#187) Manolis Kellis Chris Bristow Loyal Goff Irwin Jungreis MIT Computational Biology group Compbio.mit.edu Sushmita Roy #331: Luke Ward Stata4 Stata3 Louisa DiStefano Dave Hendrix Angela Yen Ben Holmes Soheil Feizi Mukul Bansal #19:Bob Altshuler Stefan Washietl Matt Eaton