Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh

Size: px
Start display at page:

Download "Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh"

Transcription

1 Selective constraints on noncoding DNA of mammals Peter Keightley Institute of Evolutionary Biology University of Edinburgh

2 Most mammalian noncoding DNA evolves rapidly Homo-Pan Divergence (%) Nonsynon. 4-fold Intronic Intergenic Mammalian genomes are >98.5% noncoding.

3 Questions What fraction of sites in the genome are subject to selection? Where are the functionally important noncoding DNA elements situated? Many of these elements are assumed to control gene expression.

4 Methodology Comparative: conservation beyond that expected under neutrality implies selective constraints and functionality. Rooted in the neutral theory. Mutations are considered to be either strongly deleterious or neutral. Requires standardizing evolutionary divergence against neutrally evolving sequences. Ignores adaptive substitutions. Needs to account for mutation rate variation. Within species polymorphism compared to between species divergence can quantity the rate of adaptive substitution.

5 Conservation of mammalian noncoding DNA (A) Conservation is particularly apparent near coding sequences. Example. Intergenic DNA upstream of coding sequences is conserved between human and mouse: Conservation score Shabalina et al. (2001)

6 Conservation of mammalian noncoding DNA (B) Conserved noncoding sequences (CNSs) distant from coding sequences: Is sequence conservation above that expected under neutrality?

7 Quantifying selective constraints from between-species sequence divergence Neutrally evolving sequence Functional sequence ATGAACGCCTACAGCATCCC ACGATCGCCTTCAGAATTCC..... GCAGCGACCATGCAGTCACG GCAGCGACCATACAGTCACG. Divergence = mutation rate Divergence < mutation rate Observed divergence of functional sequence = 1 Expected divergence of functional sequence = 5 Constraint = 1 - Observed/Expected = 1-1/5 = 0.8

8 Talk plan Part 1. A comparison of the levels of selective constraints between murids and hominids in: (A) Noncoding regions close to genes (B) CNSs Part 2. Genome-wide quantification of the fraction of sites that are subject to selection in murids. Inference of the genomic deleterious mutation rate, U, in murids.

9 Study organisms Mus musculus Rattus norvegicus Homo sapiens Pan troglodytes ~6 Myr D = ~13 Myr D = 0.15

10 (A) Analysis of selective constraints in intergenic DNA and first introns Constraint estimated using introns as the neutrally evolving reference. First introns and splice control regions excluded. Non CpG-prone sites analysed. Mean constraint calculated in blocks equidistant from the start or stop codon (e.g., 500 base pair blocks). Well-annotated hominid and murid genes. Aligned by MCALIGN.

11 Results: mean selective constraint in intergenic regions Hominids Murids Constraint Constraint Distance from coding sequence (bp) Distance from coding sequence (bp)

12 Results (cont.): mean selective constraint in first introns Hominids Murids Constraint Constraint Distance from coding sequence (bp) Distance from coding sequence (bp)

13 Mean constraint estimates (±SE) for 2kb blocks DNA category Hominids Murids 5' intergenic ( 0.019) 0.17 ( 0.016) 3' intergenic ( 0.018) 0.19 ( 0.017) Intron ( 0.019) 0.16 ( 0.018) Nonsynonymous 0.76 ( 0.013) 0.90 ( ) In hominids, constraint is absent in 5' flanking sequences and intron 1; >50% lower in 3' flank than murids. At nonsynonymous sites, difference in constraint is significant, but not as large. A likely explanation is the fixation of mildly deleterious mutations in hominids.

14 Fixation of mildly deleterious mutations Assumed to be a consequence of a low effective population size (N e ) in hominids. The fate of a deleterious mutation depends on N e s: N e s < -1 : rarely fixed. N e s > -1 : nearly neutral mutations, fixation probability 1/2N e. Fixation probability x 2Ne Nes

15 (B) Analysis of selective constraints in conserved noncoding segments (CNSs) The set of the most strongly conserved, untranscribed genomic segments that do not have properties of exons (Dermitzakis et al Science 302: 1033). The data set is selected, so conservation estimates are potentially upwardly biased. H C R M

16 CNS data set 2,262 human-mouse CNSs from human Chr 21. Average length 153 bp. Human-chimp and mouse-rat orthologues compiled. Constraint measured using flanking DNA reference, excluding other CNSs and coding sequences. 5' reference CNS 3' reference -5kb -3kb +3kb +5kb

17 Constraints in CNSs - Results Species Sequences Constraint Human-chimp CNSs 0.30 (0.02) Flanks (0.009) Mouse-rat CNSs 0.58 (0.007) Flanks (0.006) More evidence for accumulation of mildly deleterious mutations in hominids? 0.25 Hominids 0.25 Murids Constraint Constraint Distance from CNS (bp) Distance from CNS (bp)

18 Conclusion Part 1 There is evidence for selective constraints in noncoding regions close to protein-coding genes in both hominids and murids. CNSs and their flanks also appear to be subject to selective constraints. Constraints are substantially lower in hominids than murids; a likely explanation for this is reduced effectiveness of selection in hominids.

19 Part 2 - Genome-wide quantification of selective constraints in murids (with Daniel Gaffney) Data the complete genome sequences of mouse and rat. Assume that transposable elements (TEs) evolve close to neutrally. Calculate evolutionary constraints for all other sites in the genome using local TEs as a neutral standard, i.e., TEs within 1Mb blocks.

20 Genomic data subdivided into the following categories: Coding sequences UTRs Intron 1 (excluding TE sequences) Intron >1,, Intergenic,, TEs in introns SINEs, LINEs, LTR retrotransposons, DNA transposons. TEs in intergenic DNA,,

21 Evolutionary divergence by site category Divergence Nonsynon. 4-fold UTR Intron 1 Intron >1 Intergen. TEs (intron) TEs (intergen.)

22 Evolutionary constraints in introns Constraint Intron Constraint Intron > Distance from 5' splice site (kb) Distance from 3' splice site (kb)

23 Conclusions constraints in introns Introns generally show weak, but detectable constraints. Constraints are strongest at 5' end of intron 1. Constraint increases with distance from the 5' or 3' splice site. Weak +ve correlation between intron length and constraint. Differences in composition between TEs and introns can only partially account for this pattern.

24 Evolutionary constraints in intergenic DNA Constraint Distance from transcription start/stop (kb) Evidence of moderate conservation deep into the intergenic DNA.

25 Comparison with Drosophila Constraint Murids Dist. from transcription start/stop (kb) Constraint Drosophila Distance from coding sequence (bp) Introns give a similar, though more complex, picture.

26 Calculation of U in Murids Sequence type Mean Constraint Constrained Mb Coding UTR Intron Intergenic Calculate u = mutation rate/base pair, assuming: Mouse-rat divergence time = 13Myr Two generations per year U = 2u(Σconstrained bases) = 0.91 Does not include a contribution from indels (including TEs).

27 Summary Noncoding DNA evolves rapidly in murids and hominids. There is evidence of selective constraints in flanking regions, first introns and in CNSs. There is weaker selective constraints in these regions in hominids, perhaps explained by a difference in long term N e. Hominids may have accumulated large numbers of mildly deleterious mutations in gene expression control regions. Using TEs as a neutral standard reveals moderate selective constraints deep into intronic and intergenic DNA in murids.

28 Acknowledgments Dan Gaffney Edinburgh Adam Eyre-Walker University of Sussex Dan Halligan Edinburgh