Tales from the Dark Side of Your Genome

Size: px
Start display at page:

Download "Tales from the Dark Side of Your Genome"

Transcription

1 Tales from the Dark Side of Your Genome Gill Bejerano Dept. of Developmental Biology Dept. of Computer Science Stanford University 1

2 Biology has become a quantitative science strings time series circuits 2

3 Biology has also become a meeting place For Physicists Mathematicians Engineers Biologists Computer Scientists and more 3

4 AYBABTU 4

5 Genomics in a nutshell 5

6 DNA: Functional and Non-Functional DNA = linear molecule that carries instructions for making living organisms ~ long string(s) over a small alphabet Alphabet of four {A,C,G,T} Strings of length ACGTACGACTGACTAGCATCGACTACGACTAGCAC... junk DNA genetic instructions: how to... when to... where to... junk DNA 6

7 One Cell, One Genome, One Replication Every cell holds a copy of all its DNA = its genome. The genome is replicated every cell division. The human body is made of ~10 14 cells. All originate from a single cell through repeated cell divisions. cell DNA string egg egg genome = all DNA cell division chicken copies (DNA) of egg (DNA) chicken egg 7

8 Genes = How to make Proteins gene DNA the workhorses of every living cell cell 8

9 DNA Replication is Imperfect Medium Scale: substrings are duplicated, deleted, inverted Large Scale: whole DNA strings are duplicated, deleted junk functional...acgtacgactgactagcatcgactacga... substring duplication functional functional...acgtacgactgactagcatcgactacga...tctgactagcatcgactacga... functional divergence functional functional...acgtacgactgactagcatcgactacga...tctgactagcatcgactacga... So...More Genes...More Complexity!...Right? 9

10 1. Gene number does not correlate with Complexity Gene families are important. Many are surprisingly old. But cells fly worm human weed fish rice cells pre-genomic era: 100,000 genes to the human genome # genes 10

11 DNA Replication is Imperfect (contd) Small Scale: single letters are substituted, erased, added junk functional...acgtacgactgactagcatcgactacga... chicken egg chicken TT CAT...ACGTACGACTGACTAGCATCGACTACGA... anything many changes goes are not tolerated thus, sequence conservation over generations implies function! 11

12 Sequence Conservation implies Function Comparative Genomics of Distantly related species: functional region! human...ctttgcga-tgagtagcatctactattt... mammalian ancestor mouse...acgtgggactgacta-catcgactacga... (but which function/s?...) 12

13 2. Human Genome full of Conserved Non-Coding Elements Human Genome: 3*10 9 letters 1.5% known function >50% junk compare to other species >5% human genome functional 3x more functional DNA than known! ~10 6 substrings do not code for protein What do they do then? [Science 2004 Breakthrough of the Year, 5 th runner up] 13

14 Gene regulation = when/where to make protein recognition site ~10 letters/protein gene (how to) control region (when & where) DNA effective region ~10 3 letters Unicellular 14

15 Vertebrate Gene Regulation effective region ~10 6 letters!!! gene (how to) control region (when & where) DNA (~10 3 letters) Multicellular 15

16 3. Most Non-Coding Elements are likely cis-regulatory IRX1 is a member of the Iroquois homeobox gene family. Members of this family appear to play multiple roles during pattern formation of vertebrate embryos. gene deserts regulatory jungles 9Mb 16

17 The Writing on the Wall gene deserts 25,000 regulatory jungles 1,000,

18 DNA Conservation levels Conserved elements between human and mouse are on average 85% identical. [mouse consortium, 2002] [Bejerano et al., Science 2004] 18

19 Ultraconserved Elements fish [Bejerano et al., Science 2004] 19

20 Ultraconserved Elements Hundreds of long substrings identical between human-birds they must have rejected many different changes. But... all functions we understand in our genome are encoded using redundant codes. * * * * * E.g. Protein Coding Genes: DNA 10 8 letters over alphabet of 4. Protein 10 2 letters over alphabet of 20. Coding: 3 DNA letters 1 Protein letter. [Bejerano et al., Science 2004] 20

21 No known function requires this much conservation? * * * * * CDS ncrna TFBS seq. 21

22 What do they do? 22

23 Genomic Distribution of Ultraconserved Elements exonic non possibly 23

24 Annotation by Association Measure Correlation between genomic regions and annotation d d d genome heterogeneous body of knowledge Testable Hypothesis 24

25 Ultras are Functional Back in 2004 we hypothesized: 481 ultraconserved elements exonic subset post transcriptional regulation nonexonic subset transcriptional regulators [Ni et al., Genes Dev.; Lareau et al., Nature, 2007] [Pennacchio et al., Nature, 2006] 25

26 Repeat made Regulatory Region in situ Conserved Element Minimal Promoter Reporter Gene transgenic 26

27 Zoom to uc.351, 225Kb upstream of DACH ultra conserved e.d 12.5 [Nobrega et al., 2003] 27

28 A Vertebrate Innovation? Only 24 ultras can be partially traced back through direct sequence search to Ciona, C. Elegans or Drosophila. All overlap coding exons from known genes (17 of which show clear evidence of alt-splicing inc. EIF2C1, DDX, BCL11A, EVI1, ZFR, CLK4, HNRPH1, GRIA3). def def def No intronic element in human was found to be coding in another species, although in some cases EST evidence indicates intron retention, presumably not as CDS. Interestingly, ribosomal DNA (not part of the draft genomes) also harbors 6 ultraconserved elements in 18S, 28S. 28

29 Similar Phenomena in Flies rich in conserved non-coding fly-specific ultraconserved elements [Siepel, Bejerano et al., Genome Research 2005] [Glazov,..., Bejerano, Mattick, Genome Research 2005] 29

30 Genomic Distribution of Ultraconserved Elements exonic non possibly 30

31 Repeats / obile Elements ("selfish DNA") Human Genome: 3*10 9 letters 1.5% known function >50% junk 31

32 Cis-reg & Ultra elements from obile Elements Co-option event, probably due to favorable genomic context All other copies are destined to decay over time at a neutral rate [Yass is a small town in New South Wales, Australia.] [Bejerano et al., Nature 2006] 32

33 Exapted Into Which Cellular Roles? No evidence for Transcription (Tx) as small RNAs, no orientation preference in introns, not in antisense Tx.? x Human instances cluster together, found <1Mb from 35 TFs (P<3*10-6 ). 33

34 Repeat made Regulatory Region in situ Conserved Element Minimal Promoter Reporter Gene transgenic 34

35 Co-option into Different Roles protein coding repeat gene regulating 35

36 Relation to Human Disease SHH 1Mb LMBR1 Limb Lettice et al. HMG : [Derti et al., Nature Genetics, 2006] 36

37 Ultras are Under Strong Human Selection Ultra DAF NonSyn DAF [Katzman et al, Science,2007] 37

38 Ultraconserved Non-coding RNA About 1/3 of all ultras are expressed. Some are predicted to provide microrna targets. A few are anti-correlated with mirna expression levels. A few even act as oncogenes. mirna complementarity [Calin et al, Cancer Cell, 2007] 38

39 Touch an Ultra And You? GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAG GCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGC AATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAA CGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTT TTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCC CTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTC AGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAG ACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATC CCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGT GCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAA TGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCT CTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG GAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATT TAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTC AGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCC ATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGAT GCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTA GTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGA GAAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTG GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAG GCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGC AATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAA CGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTT TTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCC CTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTC AGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAG ACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATC CCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGT GCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAA TGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCT CTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG 39

40 Touch an Ultra And You - DIY Nadav Ahituv, Eddy Rubin, LBNL 40

41 Complete the Sentence: Ultra KO Mice Are [Ahituv et al, 2007]

42 Unchangeable but expendible? Functional: Under Strong Selection: Selection coeff. And expendible?? 42

43 Rodent-Specific Losses - all neutral functional Primate-Dog Non-Exonic [International Mouse Genome Sequencing Consortium, Nature, 2002] Lost in Mouse & Rat (in <1000bp deletions) ultras ~300 fold more persistent than neutral 43

44 What we do understand.. Ultraconserved elements exist. They are maintained via strong on-going selection. It is a heterogeneous bunch: Some mediate splicing Some regulate gene expression Some express ncrnas (categories are not necessarily mutually exclusive) Knockouts of four regulatory ultras do not lead to severe phenotypes (similar protein cases: Pbx2, Nkx6.2, Gli1) 44

45 What we don t understand Their functional density: How did they come to be? What is the selective advantage that lets them persist? 45

46 Kudos Bejerano Lab: Cory McLean Abraham Bassan Shoa Clarke Edward Chuong Fah Sathirapongsasuti fall quarter at a classroom near you.. UCSC: David Haussler, Craig Lowe, Jim Kent, Sofie Salama, the lot LBNL: Eddy Rubin UCSF: Nadav Ahituv Edward Mallinckrodt, Jr Foundation 46