Intro to population genetics
|
|
- Alyson Flowers
- 5 years ago
- Views:
Transcription
1 Intro to population genetics Shamil Sunyaev Department of Biomedical Informatics Harvard Medical School Broad Institute of M.I.T. and Harvard
2 Forces responsible for genetic change Mutation µ Selection s Drift N e Population structure F ST
3 Mutations
4 Mutation rate in humans and flies 2.5x10-8 (Nachman & Crowell) 1.8x10-8 (Kondrashov) NGS estimates ~1.2X10-8 per nt changes genome ~10 2 per nt changes genome Other events: indels (10-9 ) repeat extensions/contractions (10-5 ) large events (?)
5 Mutation rate is variable along the genome Replication fidelity DNA damage DNA repair CpG deamination Regional variation of mutation rate Context dependence of mutation rate
6 Genetic drift
7 Drift is a random change of allele frequencies
8 Drift depends on population size
9 Demographic history
10
11 Selection
12 Most functional mutations are deleterious Selective effect of mutation Deleterious Neutral Advantageous New mutation Functional Nonfunctional Selection indicates functional mutations, whether or not the tested trait is under selection 12
13 Methods of mathematical population genetics
14 Dynamic of allelic substitution Mathematically, allele frequency change in a population follows a one-dimensional random walk 1 0 time
15 Diffusion approximation Random walk that does not jump long distances can be approximated by a diffusion process φ x, p,t ( ) t = Mφ x, p,t ( ) x Vφ x, p,t ( ) x 2
16 Coalescent theory Instead of modeling a population, we can model our sample Time goes backwards t
17 Natural selection in protein coding regions
18 Signatures of purifying selection Reduced in variation Excess of rare alleles
19
20 Effect of new missense mutations
21
22 Computer simulations Demographic history φ( x, p,t) t Mφ( x, p,t) = + 1 x 2 2 Vφ( x, p,t) x 2 time Natural selection
23
24 Can we find additional evidence in sequence data? Is there any information beyond frequency? Can we tell alleles under selection from neutral alleles if they are of the same frequency?
25
26 At a given frequency deleterious and advantageous alleles are younger than neutral Maruyama effect (1974): at any frequency advantageous, or deleterious alleles are younger than neutral alleles Frequency x Frequency 0% Time 26
27 Intuition: shorter trajectories require fewer lucky jumps Frequency x Frequency 0% Shorter trajectory: 4 jumps Longer trajectory: 6 jumps Time
28 Diffusion theory: deleterious alleles pass fast through higher frequencies allele frequency Neutrals: equal time at each frequency Selecteds: faster through higher frequencies time Idea: low accumulation of mutations at linked sites indicates selection
29 Intermediate allele frequency (%) mean sojourn time (2N generations) Selection coefficient (2Ns) 0 (neutral) 2 (weakly deleterious) 10 (deleterious) 3%
30 Neighborhood clock (fuzzy clock) Closest'variant'beyond'' recombina4on'event' Variant'' Closest'rarer' linked'variant' 30
31 Neighborhood clock is consistent with Maruyama-effect expectations MAC=3 Proportion NC <= x missense ancestral missense derived probably damaging derived synonymous derived NC statistic coding-synon 7486 missense 9728 benign 4464 possibly damaging 1656 probably damaging 2654 baseline Data: pilot Genome of Netherlands dataset
32 Can signals of selection guide prioritization? Genes of interest should be highly selectively constrained ExAC dataset combines exomes of >60,000 individuals Several methods to estimate gene-based selection constrained exist (pli, RVIS) Can we estimate fitness loss directly?
33 Selection inference using frequencies of individual SNPs Change in allele frequency = = Mutation + Selection + Drift Of the order of 10-8 Demographic history Population structure
34 Focusing on rare deleterious PTVs PTV protein truncating variant (a.k.a. nonsense) Combine all PTVs per gene we assume that they have identical effects Consider each gene as a bi-allelic locus PTV / no PTV
35 Selection inference using combined frequency of PTVs Change in allele frequency = = Mutation + Selection + Drift Assuming string selection and a very large population, combined frequency of rare deleterious PTVs is expected to be Poisson distributed with l =U/hs
36 Simulations Drift becomes non-negligible U = Large s "#$ Small s "#$
37 The model PTV counts in each gene are Poisson distributed but we lack sufficient data to estimate selection coefficients We can treat selection coefficients as random variables with a distribution to be estimated
38 Distribution of selection coefficients P(shet α,β ) Heterozygous selection coefficient, s het
39 Estimates for each gene The estimated distribution over selection coefficients can be now used as a prior, and per gene estimates from posteriors V "#$,6 N 6 ; W 6 = Ü _ á S àâä,á ;g á Ü S àâä,á ;f ã,d ã Ü _ á S;g á Ü S;f ã,d ã ds
40 What happens if we incorporate drift?
41 Fraction of 4% 2% AD and AR Mendelian genes 0% >= >= [c] Mode of Inheritance in Molecular Diagnoses [Baylor] s_het bin [d] Baylor s_het bins [e] UCLA s_het bins Number of observed genes Mode of Inheritance AD AR % 80% 60% 40% 20% Fraction of genes by Mode of Inheritance 19.57% 80.43% 96.04% 21.18% 78.82% 96.70% >= % s_het < 0.04 s_het > 0.04 s_het < 0.04 s_het > 0.04
42 Age of onset, penetrance and severity
43 [c] Cell-Essential by KBM7 CRISPR Assay [d] Cell-Essential by Yeast Gene Trap Concordance with mouse knockout data [a] Orthologous mouse knockouts by phenotype [b] Distribution of s_ s_het bin Percentage of genes in each bin, by phenotype 100% 80% 60% 40% 20% 0% >= Phenotype Lethal Subviable Viable 1 11
44 Concordance with cell essentiality screens Phenotype Lethal Subviable Viable [c] Cell-Essential by KBM7 CRISPR Assay [d] Cell-Essential by Yeast Gene Trap Assay 100 s_het bin 20% 70 s_het bin 25% Percentage of genes classified as essential 20% 15% 10% 5% 0% >= Percentage of genes classified as essential 15% 10% 5% 0% >=
45 Black hole in knowledge 0.6 Non-Viable Sanger Mice KBM7 Human Cell Line Protein- Protein Interactions s_het Value Yeast Gene Trap Score 0.5 Percentage of Genes in Each Group Fewest Publications Most Publications Fewest Publications Most Publications Fewest Publications Most Publications Fewest Publications Most Publications Fewest Publications Most Publications ê