Alkes Price Harvard School of Public Health January 24 & January 26, 2017

Size: px
Start display at page:

Download "Alkes Price Harvard School of Public Health January 24 & January 26, 2017"

Transcription

1 EPI 511, Advanced Population and Medical Genetics Week 1: Intro + HapMap / 1000 Genomes Linkage Disequilibrium Alkes Price Harvard School of Public Health January 24 & January 26, 2017

2 EPI 511: Course structure Week 1: HapMap, 1000G / Linkage disequilibrium Week 2: Population structure and admixture Week 3: Population stratification Week 4: Fine-mapping / Natural selection Week 5: Heritability / Genetic risk prediction Week 6: Mixed models / Rare variant analysis Week 7: Functional interpretation

3 EPI 511: How to address the instructor Alkes Dr. Price Professor Price Honorable Professor Price Honorable Distinguished Dr. Professor Price

4 EPI 511: Office Hours Instructor: Alkes Office Hours: Thu 3:30-4:30pm, Building 2, Room 211 Address: (Please put EPI511 in the subject of your ) Teaching Assistant: Armin Office Hours: Fri + Mon 2-3pm, Building 2, Room 209 Address: arminschoech@g.harvard.edu

5 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session

6 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion

7 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class

8 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class

9 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class Video of each class will be posted on the course www site <1hr after class.

10 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class

11 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class Experiences 5 take-home projects due Tue Jan 31,, Tue Feb 28

12 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class Experiences 5 take-home projects due Tue Jan 31,, Tue Feb 28

13 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class Experiences 5 take-home projects due Tue Jan 31,, Tue Feb 28 short Research Paper due Fri Mar 10

14 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class Experiences 5 take-home projects due Tue Jan 31,, Tue Feb 28 short Research Paper due Fri Mar 10 self-assessment Opportunity 20min exam (date will not be announced in advance)

15 EPI 511: Outcome measures Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class

16 EPI 511: Outcome measures Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class Experiences (60% of course grade) 6 take-home projects (data and programming intensive)

17 Approaches to Scientific Understanding Love is Understanding. -- Madonna Data is Understanding. -- Alkes

18 EPI 511: Outcome measures Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class Experiences (60% of course grade) 6 take-home projects (data and programming intensive)

19 Approaches to Scientific Understanding Understanding Data requires Fixing Bugs.

20 Genetics + data + programming = bright future Gewin 2007 Nature Hayden 2012 Nature

21 EPI 511: Outcome measures Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class Experiences (60% of course grade) 5 take-home projects (data and programming intensive) short Research Paper (40% of course grade) 1,000-1,500 words (suggested topics provided on Feb 16)

22 EPI 511: Outcome measures Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class Experiences (60% of course grade) 5 take-home projects (data and programming intensive) short Research Paper (40% of course grade) 1,000-1,500 words (suggested topics provided on Feb 16) self-assessment Opportunity (0% of course grade) 20min exam (date will not be announced in advance)

23 EPI 511: Policy on group work Experiences (60% of course grade) 6 take-home projects (data and programming intensive) OK to discuss experiences with your colleagues Each piece of code you write should be your own short Research Paper (40% of course grade) 1,000-1,500 words (suggested topics provided on Feb 16) OK to discuss the project with your colleagues Each piece of code you write should be your own Each piece of text you write should be your own

24 EPI 511, Advanced Population and Medical Genetics Week 1: Introduction + HapMap Project Linkage Disequilibrium

25 Outline 1. Introduction to Population Genetics 2. HapMap and HapMap2 projects 3. F ST 4. HapMap3 and 1000 Genomes projects

26 Outline 1. Introduction to Population Genetics 2. HapMap and HapMap2 projects 3. F ST 4. HapMap3 and 1000 Genomes projects

27 What is Population Genetics? Population genetics is the study of genetic variation both within and between human populations.

28 Are different human populations actually genetically different?

29 Slightly. Are different human populations actually genetically different? 5-7% of worldwide human genetic variation is due to genetic differences between human populations. The remaining 93-95% of human genetic variation is due to genetic variation within human populations (Rosenberg et al Science).

30 Why study differences between human populations? Learn about human migration patterns and ancient history.

31 Why study differences between human populations? Learn about human migration patterns and ancient history. Improve our power to identify and localize disease genes.

32 Rosenberg et al Nat Rev Genet

33 Bustamante et al Nature; also see Popejoy & Fullerton 2016 Nature

34 Why study differences between human populations? Learn about human migration patterns and ancient history. Improve our power to identify and localize disease genes. Williams et al Nature

35 Why study differences between human populations? Learn about human migration patterns and ancient history. Improve our power to identify and localize disease genes. - Use differences in linkage disequilibrium for fine-mapping. - Avoid false positives due to population stratification. - Signals of natural selection at genes related to disease.

36 Does race exist?

37 For a fun time: go to a population genetics party and ask, Does race exist? Worldwide patterns of human genetic variation are best described using continuous clines instead of discrete clusters. (Serre & Paabo 2004 Genome Res) Racial classifications are inadequate descriptors of the distribution of human genetic variation. (Tishkoff & Kidd 2004 Nat Genet)

38 Isn t it politically incorrect to study differences between human populations?

39 Isn t it politically incorrect to study differences between human populations? No. It is not politically incorrect.

40 Isn t it politically incorrect to study differences between human populations? No. It is not politically incorrect. Studies of human population genetics have generated the strongest proof that there is no scientific basis for racism. (Cavalli-Sforza 2005 Nat Rev Genet) also see Cavalli-Sforza et al The History and Geography of Human Genes

41 Outline 1. Introduction to Population Genetics 2. HapMap and HapMap2 projects 3. F ST 4. HapMap3 and 1000 Genomes projects

42 The International HapMap Project (International HapMap Consortium 2005 Nature) CEU (European) YRI (Nigerian) CHB (Chinese) JPT (Japanese)

43 The International HapMap Project: 270 samples from 4 populations CEU northern European USA 90 CHB Chinese China 45 JPT Japanese Japan 44 YRI Yoruba Nigeria 90

44 The International HapMap Project (International HapMap Consortium 2005 Nature) Phase I HapMap: >1,000,000 SNPs CEU (European) YRI (Nigerian) CHB (Chinese) JPT (Japanese)

45 The International HapMap Project (International HapMap Consortium 2007 Nature) Phase II HapMap: >3,000,000 SNPs CEU (European) YRI (Nigerian) CHB (Chinese) JPT (Japanese)

46 What is a SNP? A Single Nucleotide Polymorphism (SNP) is a letter of the genome that differs in different individuals (e.g. G/T).

47 What is a SNP? A Single Nucleotide Polymorphism (SNP) is a letter of the genome that differs in different individuals (e.g. G/T). Each SNP corresponds to one single mutation event in history, e.g. G mutated to T in one single ancestor. G = ancestral allele, T = derived allele. Coalescent tree Rosenberg & Nordborg 2002 Nat Rev Genet

48 What is a SNP: physical position Each SNP has a physical position on a chromosome. physical chrom. position (bp) rs rs

49 What is a SNP: physical vs. genetic position Each SNP has a physical and genetic position on a chromosome. physical genetic position chrom. position (Morgans) rs rs recombination event per Morgan per generation. Genome-wide recombination rate is about 1cM / Mb. [cm = centimorgan = 1/100 Morgan, Mb = Megabase = 10 6 bp] Thus, 1 Morgan is roughly 100Mb = 10 8 bp on average.

50 HapMap project: Summary of main results 3.1 million SNPs successfully genotyped using Perlegen genotyping technology (Hinds et al Science). These 3.1 million SNPs: about 30% of all common SNPs (defined as SNPs with minor allele frequency >5%).

51 HapMap: 270 samples from 4 populations CEU northern European USA 90 CHB Chinese China 45 JPT Japanese Japan 44 YRI Yoruba Nigeria 90 Affymetrix and Illumina chips

52 HapMap project: Summary of main results 3.1 million SNPs successfully genotyped using Perlegen genotyping technology (Hinds et al Science). These 3.1 million SNPs: about 30% of all common SNPs (defined as SNPs with minor allele frequency >5%). Properties of SNPs are influenced by discovery sampling HapMap relied on nearly any piece of information available. Clark et al Genome Res; also see Keinan et al Nat Genet

53 Summary of main results, continued Understanding genetic differences between populations. Patterns of linkage disequilibrium both within and across populations. Most common SNPs in the human genome are in strong linkage disequilibrium with at least one HapMap SNP [avg r in 10 sequenced ENCODE regions].

54 Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature) 50% frequency C allele of rs % frequency 77% frequency

55 Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature) F ST = 0.11 F ST = 0.16 Note: F ST accounts for sampling error due to finite sample size. F ST = 0.19

56 Populations can be distinguished using a large number of genetic markers using 100 markers Principal Components Analysis

57 Populations can be distinguished using a large number of genetic markers using 3 million markers Principal Components Analysis

58 Outline 1. Introduction to Population Genetics 2. HapMap and HapMap2 projects 3. F ST 4. HapMap3 and 1000 Genomes projects

59 Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature) F ST = 0.11 F ST = 0.16 F ST = 0.19

60 Weir & Hill 2002 Annu Rev Genet, Bhatia et al Genome Res Defining vs. Estimating F ST F ST is an underlying parameter that depends on the two populations, but does not depend on a particular finite sample. ^ F ST is an estimate of the underlying F ST that depends on a particular finite sample that is analyzed.

61 Weir & Hill 2002 Annu Rev Genet, Bhatia et al Genome Res Defining F ST Definition: The F ST between two populations is the value such that the allele frequency difference between the two populations has mean 0 and variance 2F ST p(1 p), where p is the allele frequency in the ancestral population. p F ST p(1 p) F ST p(1 p) p 1 p 2

62 Weir & Hill 2002 Annu Rev Genet, Bhatia et al Genome Res Defining F ST Definition: The F ST between two populations is the value such that the allele frequency difference between the two populations has mean 0 and variance 2F ST p(1 p), where p is the allele frequency in the ancestral population. p F ST p(1 p) F ST p(1 p) p 1 p 2 p 1 ~ N(p, F ST p(1 p))

63 Weir & Hill 2002 Annu Rev Genet, Bhatia et al Genome Res Defining F ST Definition: The F ST between two populations is the value such that the allele frequency difference between the two populations has mean 0 and variance 2F ST p(1 p), where p is the allele frequency in the ancestral population. p F ST p(1 p) F ST p(1 p) p 1 p 2 p 1 ~ Beta(p(1 F ST )/F ST, (1 p)(1 F ST )/F ST )

64 Weir & Hill 2002 Annu Rev Genet, Bhatia et al Genome Res Defining F ST Definition: The F ST between two populations is the value such that the allele frequency difference between the two populations has mean 0 and variance 2F ST p(1 p), where p is the allele frequency in the ancestral population. OR The F ST between two populations is equal to the proportion of genotypic variance in a set of N individuals from each population that is attributable to population differences.

65 Defining F ST Theorem 1: The F ST between two populations is the value such that the allele frequency difference between the two populations has mean 0 and variance 2F ST p(1 p), where p is the allele frequency in the ancestral population. => The F ST between two populations is equal to the proportion of genotypic variance in a set of N individuals from each population that is attributable to population differences.

66 Defining F ST Proof: Let p avg = (p 1 + p 2 )/2. Total genotypic variance is 2p avg (1 p avg ) 2p(1 p) [Note that individuals are diploid: genotype = 0 or 1 or 2. Binomial sampling with n=2.]

67 Defining F ST Proof: Let p avg = (p 1 + p 2 )/2. Total genotypic variance is 2p avg (1 p avg ) 2p(1 p) [Note that individuals are diploid: genotype = 0 or 1 or 2. Binomial sampling with n=2.] Genotypic variance attributable to population differences: Suppose we have N data points with value 2p 1, N with value 2p 2 After subtracting the average value (p 1 + p 2 ), we have N data points with value (p 1 p 2 ), N with value (p 2 p 1 ). Since p 1 and p 2 each have variance F ST p(1 p), it follows that (p 1 p 2 ) and (p 2 p 1 ) each have variance 2F ST p(1 p)

68 Defining F ST Proof: Let p avg = (p 1 + p 2 )/2. Total genotypic variance is 2p avg (1 p avg ) 2p(1 p) [Note that individuals are diploid: genotype = 0 or 1 or 2. Binomial sampling with n=2.] Genotypic variance attributable to population differences: Suppose we have N data points with value 2p 1, N with value 2p 2 After subtracting the average value (p 1 + p 2 ), we have N data points with value (p 1 p 2 ), N with value (p 2 p 1 ). Since p 1 and p 2 each have variance F ST p(1 p), it follows that (p 1 p 2 ) and (p 2 p 1 ) each have variance 2F ST p(1 p) 2F ST p(1 p) / 2p(1 p) = F ST. Q.E.D.

69 Defining F ST Theorem 1 : The F ST between two populations is the value such that the allele frequency difference between the two populations has mean 0 and variance 2F ST p(1 p), where p is the allele frequency in the ancestral population. => The proportion of genotypic variance in a set of αn individuals from population 1 and (1 α)n individuals from population 2 that is attributable to population differences is equal to 4α(1 α) F ST.

70 Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature) F ST = 0.11 F ST = 0.16 F ST = 0.19

71 Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature) F ST = 0.11 [2F ST p(1 p)] 1/2 = 0.23 for p = 0.5 F ST = 0.16 [2F ST p(1 p)] 1/2 = 0.28 for p = 0.5 F ST = 0.19 [2F ST p(1 p)] 1/2 = 0.31 for p = 0.5

72 Genetic distances (F ST ) between European American subpopulations Ashkenazi F ST = F ST = Southeast Eur. Northwest Eur. F ST = Price, Butler et al PLoS Genet

73 Genetic distances (F ST ) between European American subpopulations Ashkenazi [2F ST p(1 p)] 1/2 = for p = 0.5 F ST = F ST = [2F ST p(1 p)] 1/2 = for p = 0.5 Southeast Eur. Northwest Eur. F ST = [2F ST p(1 p)] 1/2 = for p = 0.5 Price, Butler et al PLoS Genet

74 Genetic distances (F ST ) between East Asian subpopulations Chinese Japanese F ST = [2F ST p(1 p)] 1/2 = for p = 0.5 International HapMap Consortium 2007 Nature

75 Genetic distances (F ST ) between West African subpopulations Yoruba (Nigeria) F ST = Luhya (Kenya) [2F ST p(1 p)] 1/2 = for p = 0.5 International HapMap3 Consortium 2010 Nature

76 How do we estimate F ST? p 1 and p 2 are allele frequencies in 2 populations Var(p 1 p 2 ) = 2F ST p(1 p). Thus, estimate F ST = Var((p 1 p 2 ) / [2p(1 p)] 1/2 ). = E((p 1 p 2 ) 2 / [2p(1 p)]).

77 How do we estimate F ST? p 1 and p 2 are allele frequencies in 2 populations Var(p 1 p 2 ) = 2F ST p(1 p). Thus, estimate F ST = Var((p 1 p 2 ) / [2p(1 p)] 1/2 ). = E((p 1 p 2 ) 2 / [2p(1 p)]). A PROBLEM: we don t get to observe p (ancestral frequency) SOLUTION: approximate p p avg = (p 1 + p 2 )/2.

78 How do we estimate F ST? p 1 and p 2 are allele frequencies in 2 populations Var(p 1 p 2 ) = 2F ST p(1 p). Thus, estimate F ST = Var((p 1 p 2 ) / [2p(1 p)] 1/2 ). = E((p 1 p 2 ) 2 / [2p(1 p)]). A BIGGER PROBLEM: we don t get to observe p 1 and p 2. We only get to observe sample allele frequencies p^ 1 and p^ 2 in sample sizes N 1 (from pop. 1) and N 2 (from pop. 2).

79 How do we estimate F ST? p 1 and p 2 are allele frequencies in 2 populations Var(p 1 p 2 ) = 2F ST p(1 p). Thus, estimate F ST = Var((p 1 p 2 ) / [2p(1 p)] 1/2 ). = E((p 1 p 2 ) 2 / [2p(1 p)]). SOLUTION: Since Var(p^ 1 p^ 2 ) [2F ST + 1/(2N 1 ) + 1/(2N 2 )] p(1 p), estimate F ST = E([(p^ 1 p^ 2 ) 2 (1/(2N 1 ) + 1/(2N 2 ))p(1 p)] / [2p(1 p)]) (where we approximate p (p^ 1 + p^ 2 )/2) some details omitted; see Bhatia et al Genome Res

80 How do we estimate F ST? p 1 and p 2 are allele frequencies in 2 populations Var(p 1 p 2 ) = 2F ST p(1 p). Thus, estimate F ST = Var((p 1 p 2 ) / [2p(1 p)] 1/2 ). = E((p 1 p 2 ) 2 / [2p(1 p)]). SOLUTION: Since Var(p^ 1 p^ 2 ) [2F ST + 1/(2N 1 ) + 1/(2N 2 )] p(1 p), estimate F ST = E([(p^ 1 p^ 2 ) 2 (1/(2N 1 ) + 1/(2N 2 ))p(1 p)] / [2p(1 p)]). ^ ^ OR F ST = Σ i [(p i1 p i2 ) 2 (1/(2N 1 ) + 1/(2N 2 ))p i (1 p i )] Σ i [2p i (1 p i )] (where i indexes SNPs) some details omitted; see Bhatia et al Genome Res

81 Drift vs. Divergence Divergence (per 1000bp of DNA) Drift (F ST ) YRI CEU CHB YRI YRI CEU CEU CHB CHB NA18488 NA06989 NA18597 Keinan et al Nat Genet

82 Drift (F ST ) Drift vs. Divergence Divergence (generations) Based on mut. rate x 10-8 (Kong et al Nature, Sun et al Nat Genet) ~30K gen. ~20K gen. ~20K gen YRI CEU CHB YRI YRI CEU CEU CHB CHB NA18488 NA06989 NA18597 Keinan et al Nat Genet

83 Outline 1. Introduction to Population Genetics 2. HapMap and HapMap2 projects 3. F ST 4. HapMap3 and 1000 Genomes projects

84 HapMap: 270 samples from 4 populations CEU northern European USA 90 CHB Chinese China 45 JPT Japanese Japan 44 YRI Yoruba Nigeria 90 Affymetrix and Illumina chips Perkel 2008 Nat Methods

85 The HapMap Project: Work is done, relax on beach?

86 Beyond HapMap: what the world still needs Larger sample sizes for analyses of linkage disequilibrium More complete representation of world population diversity e.g. South Asian and Native American genetic variation Analyses of copy number variation (CNV) Low-frequency variants (minor allele frequency <5%)

87 The International HapMap3 Project: 1,260 samples from 11 diverse populations International HapMap3 Consortium 2010 Nature

88 HapMap3: 1,260 samples from 11 populations CEU northern European USA 180 CHB Chinese China 90 JPT Japanese Japan 90 YRI Yoruba Nigeria 180 TSI Tuscan Italy 90 CHD Chinese USA 100 LWK Luhya Kenya 90 MKK Maasai Kenya 180 ASW African-American USA 90 MXL Mexican-American USA 90 GIH Gujarati-American USA 90

89 The HapMap3 project Larger sample sizes for analyses of linkage disequilibrium More complete representation of world population diversity e.g. South Asian and Native American genetic variation Analyses of copy number variation (CNV) Low-frequency variants (minor allele frequency <5%) International HapMap3 Consortium 2010 Nature

90 Data generation: SNPs and CNVs Affymetrix 6.0 array 900K SNPs 940K copy-number probes Illumina Infinium 1M array 1M SNPs, of which 80K targeted at CNV regions 1.5M SNPs passed QC in all populations (99.3% concordance for 250K SNPs on both arrays) Note: only 1.5M SNPs, versus 3.1 million SNPs in HapMap2 International HapMap3 Consortium 2010 Nature

91 Not all HapMap3 populations are similar to a population from HapMap HapMap3 population Closest pop. from HapMap F ST TSI (Tuscan) CEU CHD (Chinese) CHB LWK (Luhya) YRI MKK (Maasai) YRI 0.03 ASW (African-American) YRI 0.01 MXL (Mexican-American) CEU 0.04 GIH (Gujarati-American) CEU 0.04

92 Approaches to Scientific Understanding Love is Understanding. -- Madonna Data is Understanding. -- Alkes

93 CEU.ind: HapMap3 data: individual files NA06989 F CEU NA11891 M CEU NA11843 M CEU NA12341 F CEU NA12739 M CEU [sample ID] [sex] [popname]

94 CEU.snp: HapMap3 data: SNP files rs C T rs C T rs C T rs A G rs G A [SNP ID] [chr] [0.0] [position] [ref] [var]

95 CEU.geno: HapMap3 data: genotype files [Each line is 1 SNP, each column is 1 indiv.] [Number of copies of reference allele: 0 or 1 or 2. 9 denotes missing data.] Note: the HapMap3 data files for this course are restricted to ~700K SNPs that are common (MAF>5%) in every population.

96 Beyond HapMap: what the world still needs Larger sample sizes for analyses of linkage disequilibrium More complete representation of world population diversity e.g. South Asian and Native American genetic variation Analyses of copy number polymorphisms (CNV) Low-frequency variants (minor allele frequency <5%)

97 Common Disease/Common Variant hypothesis For common diseases, there will be one or a few predominating disease alleles with relatively high frequencies at each of the major underlying disease loci Lander 1996 Science; Reich & Lander 2001 Trends Genet reviewed in Gibson 2012 Nat Rev Genet, Visscher et al Am J Hum Genet

98 Are rare and low-frequency variants important? (to be continued, Thu of Week 6) Visscher et al Am J Hum Genet

99 Are rare and low-frequency variants important? (to be continued, Thu of Week 6) Gibson 2012 Nat Rev Genet

100 Are rare and low-frequency variants important? (to be continued, Thu of Week 6) Kaiser 2012 Science

101 HapMap3 1Mb pilot sequencing study and 1000 Genomes pilot projects HapMap3 pilot sequencing: kb regions spanning 1Mb (high coverage: Sanger sequencing) 692 individuals from 10 HapMap3 populations 1000 Genomes Trio pilot project: Genome-wide (high coverage: 42x) 6 individuals (one CEU trio and one YRI trio) 1000 Genomes Low-coverage pilot project: Genome-wide (low coverage: 2x-6x) 179 individuals from CEU, YRI, CHB, JPT populations 1000 Genomes Exon pilot project: 8,140 exons spanning 1.4Mb from 906 genes (high coverage: >50x) 697 individuals from 7 HapMap3 populations International HapMap3 Consortium 2010 Nature 1000 Genomes Project Consortium 2010 Nature

102 Sample size and SNP discovery (per Mb) International HapMap3 Consortium 2010 Nature

103 The 1000 Genomes (1000G) Project Sequence the entire genomes of 1,092 individuals: 379 of European ancestry (Europe and USA) 286 of East Asian ancestry (Asia) 246 of African ancestry (Africa and USA) 181 of Latino ancestry (Latin America and USA) Use next-generation sequencing technologies (~4x coverage): e.g. Illumina, 454, SOLiD (read lengths bp) (Metzker 2010 Nat Rev Genet, Davey et al Nat Rev Genet, also see Nielsen et al Nat Rev Genet) 1000 Genomes Project Consortium 2012 Nature

104 1000G project: Summary of main results 38 million SNPs discovered and successfully genotyped. Most of these are rare and low-frequency variants. The 38 million SNPs include 99.7% of all SNPs with minor allele frequency 5% 98% of all SNPs with minor allele frequency 1% *** 50% of all SNPs with minor allele frequency 0.1% based on an independent UK European sample. ***: stated goal to identify >95% of SNPs with frequency 1% was successfully achieved Genomes Project Consortium 2012 Nature

105 Common variants are shared across populations, but rare variants are often population-private 1000 Genomes Project Consortium 2012 Nature

106 1000G project: the final phase Sequence the entire genomes of 2,504 individuals: 503 of European ancestry (Europe and USA) 504 of East Asian ancestry (Asia) 661 of African ancestry (Africa and USA) 347 of Latino ancestry (Latin America and USA) 489 of South Asian ancestry (South Asia and USA) Use next-generation sequencing technologies (~7x coverage): Illumina only (read lengths bp only) 85 million SNPs, of which 64 million have MAF<0.5% Related resource: UK10K project: 7x WGS of 3,781 UK samples (UK10K Consortium 2015 Nature; also see Gudbjartsson et al Nature) 1000 Genomes Project Consortium 2015 Nature

107 1000G project: the final phase Sequence the entire genomes of 2,504 individuals: 503 of European ancestry (Europe and USA) 504 of East Asian ancestry (Asia) 661 of African ancestry (Africa and USA) 347 of Latino ancestry (Latin America and USA) 489 of South Asian ancestry (South Asia and USA) Use next-generation sequencing technologies (~7x coverage): Illumina only (read lengths bp only) 85 million SNPs, of which 64 million have MAF<0.5% 1000 Genomes Project Consortium 2015 Nature; also see UK10K Consortium 2015 Nature, Gudbjartsson et al Nat Genet, McCarthy et al Nat Genet

108 What about rare variants? The 1000G project has identified most low-frequency variants (minor allele frequency 1%-5%). These variants can be placed on genotyping arrays or imputed (see Thu of Week 1)

109 What about rare variants? The 1000G project has identified most low-frequency variants (minor allele frequency 1%-5%). These variants can be placed on genotyping arrays or imputed (see Thu of Week 1) Rare variants: most have not been identified by 1000 Genomes! Must sequence disease samples directly. Past focus has been mostly on exome sequencing, but now shifting to whole-genome sequencing. (to be continued, Thu of Week 6) Kiezun et al Nat Genet, Tennessen et al Science, Pasaniuc et al Nat Genet, Purcell et al Nature, Do et al Nature, Cai et al Nature. Reviewed in Goldstein et al Nat Rev Genet, Lee et al Am J Hum Genet, Zuk et al PNAS

110 Conclusions Human populations are slightly genetically different. These differences may be important for disease mapping. (see Thu slides: Linkage Disequilibrium.) F ST quantifies differences between human populations. HapMap, HapMap2, HapMap3 and 1000 Genomes projects provide a valuable resource for common & low-frequency variants (but most rare variants have not yet been identified).

111 EPI 511, Advanced Population and Medical Genetics Week 1: Intro + HapMap / 1000 Genomes Linkage Disequilibrium

112 EPI 511: Course components Advance reading 1 required paper + 1 optional paper per course session Lecture + Discussion discussants: each student to sign up as discussant for 1 class

113 Outline 1. Introduction to Linkage Disequilibrium 2. LD and Tag SNPs 3. LD and imputation 4. LD and fine-mapping

114 Outline 1. Introduction to Linkage Disequilibrium 2. LD and Tag SNPs 3. LD and imputation 4. LD and fine-mapping

115 Linkage Disequilibrium Definition: Linkage Disequilibrium (LD) refers to correlations between genotypes of nearby markers.

116 Linkage Disequilibrium Definition: Linkage Disequilibrium (LD) refers to correlations between genotypes of nearby markers. Linkage Disequilibrium Association Studies Linkage Disequilibrium Linkage Mapping (reviewed in Ott et al Nat Rev Genet)

117 Linkage Disequilibrium: Example Individuals billion letters A A G A T T A A C G T T G G C C A A... A A G G T T A A C C T T G G C T A A... A A A A T T A A G G T T G G T C A A... A A G G T T A A C C T T G G T T A A... A A G A T T A A C G T T G G C T A A... A A G G T T A A C C T T G G C T A A... A A G A T T A A C C T T G G C C A A... A A G G T T A A C C T T G G T T A A... SNP 1 SNP 2 YES, in LD

118 Linkage Disequilibrium: Example Individuals billion letters A A G A T T A A C G T T G G C C A A... A A G G T T A A C C T T G G C T A A... A A A A T T A A G G T T G G T C A A... A A G G T T A A C C T T G G T T A A... A A G A T T A A C G T T G G C T A A... A A G G T T A A C C T T G G C T A A... A A G A T T A A C C T T G G C C A A... A A G G T T A A C C T T G G T T A A... SNP 1 SNP 2 SNP 3 YES, in LD NOT in LD

119 Linkage Disequilibrium: Example Individuals billion letters SNP 1 SNP 2 SNP 3 r 2 =1, in LD r 2 =0, NOT in LD r 2 is squared correlation

120 Linkage Disequilibrium: Example Individuals billion letters SNP 1 SNP 2 SNP 3 r 2 =1, in LD r 2 =0.7, partial LD r 2 is squared correlation

121 Linkage Disequilibrium: Example Individuals billion letters SNP 1 SNP 2 SNP 3 r 2 =1, in LD r 2 =0.7, partial LD r 2 is squared correlation

122 Genotypes vs. Haplotypes: phasing Genotypes Haplotypes Individuals PHASING Individuals Stephens et al Am J Hum Genet, Browning et al Nat Rev Genet, Williams et al Am J Hum Genet, Delaneau et al Nat Methods, Loh et al. 2016a Nat Genet, Loh et al. 2016b Nat Genet

123 Genotypes vs. Haplotypes: phasing Genotypes Haplotypes Individuals PHASING Individuals Stephens et al Am J Hum Genet, Browning et al Nat Rev Genet, Williams et al Am J Hum Genet, Delaneau et al Nat Methods, Loh et al. 2016a Nat Genet, Loh et al. 2016b Nat Genet

124 Genotypes vs. Haplotypes: phasing Genotypes Haplotypes Individuals PHASING Individuals Fact: r 2 between SNP1 and SNP2 (phased haplotype data) equals r 2 between SNP1 and SNP2 (unphased genotype data), assuming Hardy-Weinberg equilibrium holds

125 Linkage Disequilibrium: Haplotype Blocks Individuals SNP SNP SNP These 3 SNPs form a haplotype block with two main haplotypes 3 billion letters

126 LD with phased haplotypes: r 2 vs. D Slatkin 2008 Nat Rev Genet Consider two SNPs with frequencies p A and p B of alleles A, B. Let g A refer to # copies (0, 1) of allele A for the first SNP. Let g B refer to # copies (0, 1) of allele B for the second SNP. ) (1 ) (1 ) ( ) ( ) ( )] ( ) ( ) ( [ B B A A B A AB B A B A B A p p p p p p p g Var g Var g E g E g g E r

127 LD with phased haplotypes: r 2 vs. D Slatkin 2008 Nat Rev Genet Consider two SNPs with frequencies p A and p B of alleles A, B. Suppose p A < p B < 0.5. ) (1 ) (1 2 2 B B A A B A AB p p p p p p p r B A A B A AB p p p p p p D

128 LD with phased haplotypes: r 2 vs. D Slatkin 2008 Nat Rev Genet Consider two SNPs with frequencies p A and p B of alleles A, B. Suppose p A < p B < 0.5. r 2 and D are maximized when p AB = p A. 1 B A A B A AB p p p p p p D B A B B A A B B A A B A AB p p p p p p p p p p p p p r ) (1 ) (1 2 2

129 LD with phased haplotypes: r 2 vs. D Slatkin 2008 Nat Rev Genet Consider two SNPs with frequencies p A and p B of alleles A, B. Suppose p A < p B < 0.5. r 2 and D are maximized when p AB = p A. e.g. p A = 0.25, p B = 0.4, p AB = 0.25 => r 2 = 0.5, D = 1 1 B A A B A AB p p p p p p D B A B B A A B B A A B A AB p p p p p p p p p p p p p r ) (1 ) (1 2 2

130 LD with unphased diploid genotypes Consider two SNPs with frequencies p A and p B of alleles A, B. Let g A refer to # copies (0, 1, 2) of allele A for the first SNP. Let g B refer to # copies (0, 1, 2) of allele B for the second SNP. r 2 D [ E( g p p AB A A Var ( g g B p p A ) A p p A B E( g B A ) Var ( g 1 ) E( g B ) B )] 2... cannot be directly computed, since p AB relies on phased data! Slatkin 2008 Nat Rev Genet

131 Approaches to Scientific Understanding Love is Understanding. -- Madonna Data is Understanding. -- Alkes

132 200 kb Linkage Disequilibrium: Haplotype Blocks Haplotype blocks in 216kb region (MHC, chr 6) x-axis = y-axis = SNP position in region 100 kb 0 kb D and L are measures of LD (related to r 2 ) Red indicates high LD Black indicates low LD Also see Haploview program, Barrett et al Bioinformatics Slatkin 2008 Nat Rev Genet

133 Linkage Disequilibrium: Haplotype Blocks Europeans and Asians Africans Gabriel et al Science also see Reich 2001 Nature, Daly 2001 Nat Genet

134 Linkage Disequilibrium: Haplotype Blocks African chromosomes: 50% of the genome lies in haplotype blocks >22kb. Europeans and Asians: 50% of the genome lies in haplotype blocks >44kb. Longer haplotype blocks in Europeans/Asians due to out-of-africa population bottleneck: descended from small number of ancestors who left Africa kya. Gabriel et al Science also see Reich 2001 Nature, Daly 2001 Nat Genet

135 A brief history of modern humans Cavalli-Sforza & Feldman 2003 Nat Genet; also see Ramachandran et al PNAS, Mellars 2006 Science, Armitage et al Science, Henn et al PNAS

136 A brief history of modern humans, contradicted All non-african populations have ~2% of their genomes descended from Neanderthals. Melanesian populations have ~5% of their genomes descended from Denisovans, a relative of Neanderthals. Green et al Science, Reich et al Nature, Meyer et al Science, Sankararaman et al Nature, Vernot & Akey 2014 Science reviewed in Racimo et al Nat Rev Genet

137 Population bottlenecks increase LD population bottleneck population bottleneck Cavalli-Sforza & Feldman 2003 Nat Genet; also see Ramachandran et al PNAS, Mellars 2006 Science, Armitage et al Science, Henn et al PNAS

138 Population bottlenecks increase LD Individuals billion letters SNP 2 SNP 3 r 2 =0, NOT in LD r 2 is squared correlation

139 Population bottlenecks increase LD due to subsampling haplotypes (genetic drift) Individuals billion letters SNP 2 SNP 3 r 2 =0, NOT in LD r 2 is squared correlation

140 Population bottlenecks increase LD due to subsampling haplotypes (genetic drift) Individuals billion letters SNP 2 SNP 3 r 2 =0.5, partial LD

141 Population bottlenecks increase LD due to subsampling haplotypes (genetic drift) Individuals billion letters SNP 2 SNP 3 r 2 =0.5, partial LD r 2 is squared correlation

142 Population bottlenecks increase LD Average number of haplotypes per genomic region Conrad et al Nat Genet

143 Outline 1. Introduction to Linkage Disequilibrium 2. LD and Tag SNPs 3. LD and imputation 4. LD and fine-mapping

144 Linkage Disequilibrium and tag SNPs Direct association: genotype SNP1 in Cases and Controls. Cases Individuals Controls 3 billion letters SNP 1: causal SNP

145 Linkage Disequilibrium and tag SNPs Indirect association: genotype SNP2 in Cases and Controls. If SNP1 affects disease risk, then SNP2 will also be associated! Individuals Cases Controls 3 billion letters SNP 1 SNP 2 r 2 =1, in LD

146 Linkage Disequilibrium and tag SNPs Indirect association: genotype SNP3 in Cases and Controls. If SNP1 affects disease risk, then SNP3 will also be associated! Individuals Cases Controls 3 billion letters SNP 1 SNP 2 SNP 3 r 2 =0.7, partial LD

147 Linkage Disequilibrium and tag SNPs Theorem 2 (Pritchard and Przeworski 2001 Am J Hum Genet): If SNP1 is causal and LD(SNP1,SNP2) = r 2, then Power of an association study of SNP1 with N samples = Power of an association study of SNP2 with N/r 2 samples.

148 Linkage Disequilibrium and tag SNPs Theorem 2 (Pritchard and Przeworski 2001 Am J Hum Genet): If SNP1 is causal and LD(SNP1,SNP2) = r 2, then Power of an association study of SNP1 with N samples = Power of an association study of SNP2 with N/r 2 samples. Proof: Let g1 and g2 be genotypes of SNP1 and SNP2 respectively and π be phenotype, all normalized to mean 0 and variance 1. Armitage Trend Test (χ 2 = Nρ(g, π) 2 ; Armitage 1955 Biometrics).

149 Linkage Disequilibrium and tag SNPs Theorem 2 (Pritchard and Przeworski 2001 Am J Hum Genet): If SNP1 is causal and LD(SNP1,SNP2) = r 2, then Power of an association study of SNP1 with N samples = Power of an association study of SNP2 with N/r 2 samples. Proof: Let g1 and g2 be genotypes of SNP1 and SNP2 respectively and π be phenotype, all normalized to mean 0 and variance 1. Armitage Trend Test (χ 2 = Nρ(g, π) 2 ; Armitage 1955 Biometrics): SNP1 with N samples: Nρ(g1, π) 2 = NE(g1 π) 2 SNP2 with N/r 2 samples: (N/r 2 )ρ(g2, π) 2 = (N/r 2 )E(g2 π) 2 = (N/r 2 )E([rg1 + (g2-rg1)] π) 2 = (N/r 2 )E(rg1 π) 2 = NE(g1 π) 2. Q.E.D.

150 Linkage Disequilibrium: Haplotype Blocks Risk haplotype Case Case Case Case Case Control Control Control Control Control Question: Which SNP to genotype? Answer: Choose 1 SNP per haplotype block, and take advantage of indirect association!

151 Linkage Disequilibrium: Haplotype Blocks Risk haplotype Case Case Case Case Case Control Control Control Control Control Needed: a resource describing the haplotypes at each location in the genome.

152 The International HapMap Project: 270 samples from 4 populations CEU European USA trios YRI Yoruba Nigeria trios CHB Chinese China 45 unrelated JPT Japanese Japan 45 unrelated

153 Genetic differences between populations are small 50% frequency C allele of rs % frequency 51% frequency 11kb away on chr 1 A allele of rs % frequency

154 LD differences between populations are large! 50% frequency C allele of rs % frequency r 2 = kb away on chr 1 r 2 = % frequency A allele of rs % frequency

155 HapMap project: a resource for SNP tagging Individuals billion letters SNP 1 SNP 2 SNP 3 SNP1 tags this entire haplotype block at an r 2 of 0.7

156 HapMap project: a resource for SNP tagging How to select SNPs to genotype in an association study: Choose genomic region(s) of interest. Look up HapMap SNPs in the genomic region(s). Choose a subset of HapMap SNPs which tag haplotype blocks in the genomic region(s). (e.g. Tagger algorithm, de Bakker et al Nat Genet) Note: because LD patterns vary by population, it is important to choose tag SNPs using a HapMap population similar to the population in the association study.

157 HapMap project: a resource for SNP tagging How many tag SNPs are required? For the entire genome, the answer is: Thus, to choose tag SNPs at an r 2 of 0.8, we need roughly 1 SNP per 3kb in YRI, or 1 SNP per 5kb in CEU or CHB+JPT International HapMap Consortium 2007 Nature; also see Barrett et al Nat Genet, Smith et al Genomics, International HapMap Consortium 2005 Nature

158 Things aren t always what they seem

159 Things aren t always what they seem Estimating LD using a small number of HapMap samples may lead to overfitting. HapMap SNPs are not a random subset of SNPs.

160 Things aren t always what they seem Estimating LD using a small number of HapMap samples may lead to overfitting. HapMap SNPs are not a random subset of SNPs. Bhangale et al Nat Genet

161 Things aren t always what they seem According to International HapMap Consortium 2007 Nature: 82% of common SNPs are tagged at r2 0.8 by Affymetrix 6.0 According to Bhangale et al Nat Genet: 66% of common SNPs are tagged at r2 0.8 by Affymetrix 6.0 Bhangale et al Nat Genet

162 Multi-SNP tagging Haplotype [freq. 25% for each haplotype] (causal) SNP1 A A C C SNP2 A C C A SNP3 A C A C r 2 =0, NOT in LD

163 Multi-SNP tagging Haplotype [freq. 25% for each haplotype] (causal) SNP1 A A C C SNP2+3 A+A C+C C+A A+C r 2 =1, YES in LD

164 Multi-SNP tagging Pe er et al Nat Genet also see Zaitlen et al Am J Hum Genet

165 Outline 1. Introduction to Linkage Disequilibrium 2. LD and Tag SNPs 3. LD and imputation 4. LD and fine-mapping

166 What is imputation? Marchini et al Nat Genet, Howie et al PLoS Genet, Li et al Genet Epidemiol, Howie et al Nat Genet, Fuchsberger et al Bioinformatics

167 Marchini et al Nat Genet, Howie et al PLoS Genet, Li et al Genet Epidemiol, Howie et al Nat Genet, Fuchsberger et al Bioinformatics What is imputation??

168 Imputation: Why try? Increase power to detect disease association at untyped causal SNP (imputed causal SNP may have stronger association than tag SNP)

169 Marchini et al Nat Genet, Howie et al PLoS Genet, Li et al Genet Epidemiol, Howie et al Nat Genet, Fuchsberger et al Bioinformatics Imputation: Why try? r 2 = 0.8 Causal SNP

170 Marchini et al Nat Genet, Howie et al PLoS Genet, Li et al Genet Epidemiol, Howie et al Nat Genet, Fuchsberger et al Bioinformatics Imputation: Why try? Causal SNP

171 Imputation: Why try? Increase power to detect disease association at untyped causal SNP (imputed causal SNP may have stronger association than tag SNP)

172 Imputation: Why try? Increase power to detect disease association at untyped causal SNP (imputed causal SNP may have stronger association than tag SNP) Enable meta-analysis of studies on Affymetrix + Illumina chips

173 Imputation: Why try? Increase power to detect disease association at untyped causal SNP (imputed causal SNP may have stronger association than tag SNP) Enable meta-analysis of studies on Affymetrix + Illumina chips Improve genotype data quality

174 Imputation: Algorithms Hidden Markov Model (HMM) based approaches: IMPUTE (Marchini et al Nat Genet, Howie et al PLoS Genet, Howie et al Nat Genet) MACH (Li et al Genet Epidemiol) fastphase/bimbam (Scheet/Stephens 2006 AJHG, Servin/Stephens 2007 PLoS Genet, Guan/Stephens 2008 PLoS Genet) GEDI (Kennedy et al ISBRA) Localized Haplotype Clustering: BEAGLE (Browning/Browning 2007 AJHG, Browning/Browning 2009 AJHG) Likelihood-based approaches: UNPHASED (Dudbridge 2008 Hum Hered) SNPMStat (Lin et al AJHG) reviewed in Marchini et al Nat Rev Genet; also see Li et al ARGHG

175 Imputation: What do the algorithms output? Integer-valued genotypes at untyped SNPs e.g. genotype = 2 OR Continuous genotype dosages at untyped SNPs e.g. genotype dosage = 1.79 OR Continuous genotype probabilities at untyped SNPs e.g. genotype probabilities P(0) = 0.01, P(1) = 0.19, P(2) = 0.80

176 Imputation: People do it. reviewed in Marchini et al Nat Rev Genet; also see Li et al ARGHG

177 reviewed in Marchini et al Nat Rev Genet; also see Li et al ARGHG HMM-based imputation approaches hap1 hap2 hap3 hap4 hap5 Imp.??? Note: current paradigm is to first phase the data, then run imputation on phased data (Howie et al Nat Genet, Fuchsberger et al Bioinformatics)

178 reviewed in Marchini et al Nat Rev Genet; also see Li et al ARGHG HMM-based imputation approaches hap1 hap2 hap3 hap4 hap5 Imp. Note: current paradigm is to first phase the data, then run imputation on phased data (Howie et al Nat Genet, Fuchsberger et al Bioinformatics)

179 Measuring imputation accuracy Concordance rate: % of genotypes (or alleles) imputed correctly Natural analogue of genotyping error rate in QC analyses Concordance rate is often in the range of 95-99%. Squared correlation (r 2 ) between true and imputed genotype Natural analogue of r 2 between causal SNP and tag SNP r 2 << concordance rate, particularly for rare SNPs.

180 Measuring imputation accuracy Concordance rate: % of genotypes (or alleles) imputed correctly Natural analogue of genotyping error rate in QC analyses Concordance rate is often in the range of 95-99%. Squared correlation (r 2 ) between true and imputed genotype Natural analogue of r 2 between causal SNP and tag SNP r 2 << concordance rate, particularly for rare SNPs.

181 Measuring imputation accuracy Concordance rate: % of genotypes (or alleles) imputed correctly Natural analogue of genotyping error rate in QC analyses Concordance rate is often in the range of 95-99%. Squared correlation (r 2 ) between true and imputed genotype Natural analogue of r 2 between causal SNP and tag SNP r 2 << concordance rate, particularly for rare SNPs.

182 Measuring imputation accuracy Concordance rate: % of genotypes (or alleles) imputed correctly Natural analogue of genotyping error rate in QC analyses Concordance rate is often in the range of 95-99%. Squared correlation (r 2 ) between true and imputed genotype Natural analogue of r 2 between causal SNP and tag SNP r 2 << concordance rate, particularly for rare SNPs. Normalized difference between true and imputed allele frequency Measures whether imputation is biased towards ref or var allele

183 Imputation using HapMap data common SNPs imputed using HapMap2 CEU (N=120): r 2 = 0.95 (European-ancestry WTCCC samples, Affymetrix & Illumina chips) International HapMap3 Consortium 2010 Nature

184 Imputation using HapMap data common SNPs imputed using HapMap2 CEU (N=120): r 2 = 0.95 common SNPs imputed using HapMap3 CEU+TSI (N=410): r 2 = 0.96 (European-ancestry WTCCC samples, Affymetrix & Illumina chips) International HapMap3 Consortium 2010 Nature

185 Imputation using HapMap data x-axis: MAF<5% SNPs, imputed using HapMap2 CEU (N=120) y-axis: MAF<5% SNPs, imputed using HapMap3 CEU+TSI (N=410) International HapMap3 Consortium 2010 Nature

186 Imputation using HapMap data x-axis: MAF<5% SNPs, imputed using HapMap2 CEU (N=120) y-axis: MAF<5% SNPs, imputed using HapMap3 CEU+TSI (N=410) International HapMap3 Consortium 2010 Nature

187 Imputation using HapMap data x-axis: MAF<5% SNPs, imputed using HapMap2 CEU (N=120) y-axis: MAF<5% SNPs, imputed using HapMap3 CEU+TSI (N=410) International HapMap3 Consortium 2010 Nature

188

189 Low-coverage sequencing + imputation increases power vs. genotyping arrays Effective sample size of a GWAS with a $300,000 budget: Cost per sample Actual #samples Average imputation r 2 Effective #samples Illumina 1M array $ x sequencing $83* 3,60.81** 2, x sequencing $43* 7,00.64** 4,500 *Based on sample preparation cost of $30/sample, which is conservatively double the $15/sample reported by Rohland & Reich 2012 Genome Res, and on $133 per 1x sequencing (Illumina Network cost). **Imputation r 2 attained at Illumina 1M SNPs by downsampling reads from real off-target exome sequencing data. Relative performance of low-coverage sequencing will be even higher at non-illumina 1M SNPs. Pasaniuc et al Nat Genet; also see Cai et al Nature, Davies et al Nat Genet

190 Outline 1. Introduction to Linkage Disequilibrium 2. LD and Tag SNPs 3. LD and imputation 4. LD and fine-mapping (to be continued, Tue of Week 4)

191 Definition of fine-mapping Which of these SNPs on chr 6 is the biologically causal SNP? (Ditto for chr 5, 8, 12, 19) Manhattan plot from Ikram et al PLoS Genet

192 WTCCC fine-mapping study Maller et al Nat Genet

193 LD and fine-mapping in Europeans GWAS in Europeans SNP1: P-value = 10-8

194 TCF7L2 locus in T2D: 1 top signal Maller et al Nat Genet

195 LD and fine-mapping in Europeans Fine-mapping in Europeans SNP1: P-value = 10-8 CAUSAL?? SNP2: P-value = 10-8 CAUSAL??

196 FTO locus in T2D: many top signals Maller et al Nat Genet

197 LD and cross-population fine-mapping Fine-mapping in Europeans Fine-mapping in Africans SNP1: P-value = 10-8 SNP1: P-value = 10-5 SNP2: P-value = 10-8 SNP2: P-value = 0.62 SNP3: P-value = 0.41 SNP3: P-value = 10-5 r 2 LD in Europeans SNP1 SNP2 SNP3 SNP SNP SNP r 2 LD in Africans SNP1 SNP2 SNP3 SNP SNP SNP

198 LD and cross-population fine-mapping Fine-mapping in Europeans Fine-mapping in Africans SNP1: P-value = 10-8 SNP1: P-value = 10-5 CAUSAL SNP2: P-value = 10-8 SNP2: P-value = 0.62 SNP3: P-value = 0.41 SNP3: P-value = 10-5 r 2 LD in Europeans SNP1 SNP2 SNP3 SNP SNP SNP r 2 LD in Africans SNP1 SNP2 SNP3 SNP SNP SNP

199 LD and multi-ethnic fine-mapping Zaitlen*, Pasaniuc* et al Am J Hum Genet also see Morris 2011 Genet Epidemiol, Udler et al Hum Mol Genet, Wu et al PLoS Genet, Peters et al PLoS Genet, Liu et al Am J Hum Genet

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get

More information

Haplotype phasing in large cohorts: Modeling, search, or both?

Haplotype phasing in large cohorts: Modeling, search, or both? Haplotype phasing in large cohorts: Modeling, search, or both? Po-Ru Loh Harvard T.H. Chan School of Public Health Department of Epidemiology Broad MIA Seminar, 3/9/16 Overview Background: Haplotype phasing

More information

Genome-wide association studies (GWAS) Part 1

Genome-wide association studies (GWAS) Part 1 Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations

More information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1 Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the

More information

Supplementary Note: Detecting population structure in rare variant data

Supplementary Note: Detecting population structure in rare variant data Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to

More information

PLINK gplink Haploview

PLINK gplink Haploview PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,

More information

PERSPECTIVES. A gene-centric approach to genome-wide association studies

PERSPECTIVES. A gene-centric approach to genome-wide association studies PERSPECTIVES O P I N I O N A gene-centric approach to genome-wide association studies Eric Jorgenson and John S. Witte Abstract Genic variants are more likely to alter gene function and affect disease

More information

What is genetic variation?

What is genetic variation? enetic Variation Applied Computational enomics, Lecture 05 https://github.com/quinlan-lab/applied-computational-genomics Aaron Quinlan Departments of Human enetics and Biomedical Informatics USTAR Center

More information

H3A - Genome-Wide Association testing SOP

H3A - Genome-Wide Association testing SOP H3A - Genome-Wide Association testing SOP Introduction File format Strand errors Sample quality control Marker quality control Batch effects Population stratification Association testing Replication Meta

More information

Genome-Wide Association Studies (GWAS): Computational Them

Genome-Wide Association Studies (GWAS): Computational Them Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus

More information

Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation

Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation Asif Javed 1,2, Petros Drineas 2, Michael W. Mahoney 3 and Peristera Paschou 4 1 Computational Biology Center, IBM T. J. Watson

More information

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Anthony Green Sr. Genotyping Sales Specialist North America 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx,

More information

Computational Workflows for Genome-Wide Association Study: I

Computational Workflows for Genome-Wide Association Study: I Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

I/O Suite, VCF (1000 Genome) and HapMap

I/O Suite, VCF (1000 Genome) and HapMap I/O Suite, VCF (1000 Genome) and HapMap Hin-Tak Leung April 13, 2013 Contents 1 Introduction 1 1.1 Ethnic Composition of 1000G vs HapMap........................ 2 2 1000 Genome vs HapMap YRI (Africans)

More information

Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on Computa.onal Genomics Seyoung Kim

Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on Computa.onal Genomics Seyoung Kim Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on 02-710 Computa.onal Genomics Seyoung Kim Overview Two fundamental forces that shape genome sequences Recombina.on Muta.on, gene.c

More information

Amapofhumangenomevariationfrom population-scale sequencing

Amapofhumangenomevariationfrom population-scale sequencing doi:.38/nature9534 Amapofhumangenomevariationfrom population-scale sequencing The Genomes Project Consortium* The Genomes Project aims to provide a deep characterization of human genome sequence variation

More information

Linkage Disequilibrium. Adele Crane & Angela Taravella

Linkage Disequilibrium. Adele Crane & Angela Taravella Linkage Disequilibrium Adele Crane & Angela Taravella Overview Introduction to linkage disequilibrium (LD) Measuring LD Genetic & demographic factors shaping LD Model predictions and expected LD decay

More information

UK Biobank Axiom Array

UK Biobank Axiom Array DATA SHEET Advancing human health studies with powerful genotyping technology Array highlights The Applied Biosystems UK Biobank Axiom Array is a powerful array for translational research. Designed using

More information

Detecting ancient admixture using DNA sequence data

Detecting ancient admixture using DNA sequence data Detecting ancient admixture using DNA sequence data October 10, 2008 Jeff Wall Institute for Human Genetics UCSF Background Origin of genus Homo 2 2.5 Mya Out of Africa (part I)?? 1.6 1.8 Mya Further spread

More information

Supplementary Figures

Supplementary Figures Supplementary Figures 1 Supplementary Figure 1. Analyses of present-day population differentiation. (A, B) Enrichment of strongly differentiated genic alleles for all present-day population comparisons

More information

Haplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014

Haplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014 Haplotypes 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Terminology Review llele: different forms of genecc variacons at a given gene or genecc locus Locus 1 has two alleles, and

More information

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip : Sample Size, Power, Imputation, and the Choice of Genotyping Chip Chris C. A. Spencer., Zhan Su., Peter Donnelly ", Jonathan Marchini " * Department of Statistics, University of Oxford, Oxford, United

More information

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

Phasing of 2-SNP Genotypes based on Non-Random Mating Model Phasing of 2-SNP Genotypes based on Non-Random Mating Model Dumitru Brinza and Alexander Zelikovsky Department of Computer Science, Georgia State University, Atlanta, GA 30303 {dima,alexz}@cs.gsu.edu Abstract.

More information

Genotype SNP Imputation Methods Manual, Version (May 14, 2010)

Genotype SNP Imputation Methods Manual, Version (May 14, 2010) Genotype SNP Imputation Methods Manual, Version 0.1.0 (May 14, 2010) Practical and methodological considerations with three SNP genotype imputation software programs Available online at: www.immport.org

More information

Human Genetics 544: Basic Concepts in Population and Statistical Genetics Fall 2016 Syllabus

Human Genetics 544: Basic Concepts in Population and Statistical Genetics Fall 2016 Syllabus Human Genetics 544: Basic Concepts in Population and Statistical Genetics Fall 2016 Syllabus Description: The concepts and analytic methods for studying variation in human populations are the subject matter

More information

General aspects of genome-wide association studies

General aspects of genome-wide association studies General aspects of genome-wide association studies Abstract number 20201 Session 04 Correctly reporting statistical genetics results in the genomic era Pekka Uimari University of Helsinki Dept. of Agricultural

More information

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications Ranajit Chakraborty, Ph.D. Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications Overview Some brief remarks about SNPs Haploblock structure of SNPs in the human genome Criteria

More information

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY. The psoriasis associated deletion of late cornified envelope genes LCE3B and LCE3C has been maintained under balancing selection since Human Denisovan divergence Petar Pajic 1 *, Yen Lung Lin 1 *, Duo

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu January 29, 2015 Why you re here

More information

Axiom mydesign Custom Array design guide for human genotyping applications

Axiom mydesign Custom Array design guide for human genotyping applications TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required

More information

Exploring the Genetic Basis of Congenital Heart Defects

Exploring the Genetic Basis of Congenital Heart Defects Exploring the Genetic Basis of Congenital Heart Defects Sanjay Siddhanti Jordan Hannel Vineeth Gangaram szsiddh@stanford.edu jfhannel@stanford.edu vineethg@stanford.edu 1 Introduction The Human Genome

More information

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human?

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human? TEST FORM A Evolution PCB 4673 Exam # 2 Name SSN Multiple Choice: 3 points each 1. The horseshoe crab is a so-called living fossil because there are ancient species that looked very similar to the present-day

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL SUPPLEMENTAL MATERIAL Supplementary Table 1: RT-qPCR primer sequences. Sequences are shown from 5 to 3 direction; all primers are designed using mouse genome as reference. 36B4-F; TGAAGCAAAGGAAGAGTCGGAGGA

More information

Using the Association Workflow in Partek Genomics Suite

Using the Association Workflow in Partek Genomics Suite Using the Association Workflow in Partek Genomics Suite This user guide will illustrate the use of the Association workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within

More information

Themes. Homo erectus. Jin and Su, Nature Reviews Genetics (2000)

Themes. Homo erectus. Jin and Su, Nature Reviews Genetics (2000) HC70A & SAS70A Winter 2009 Genetic Engineering in Medicine, Agriculture, and Law Tracking Human Ancestry Professor John Novembre Themes Global patterns of human genetic diversity Tracing our ancient ancestry

More information

Derrek Paul Hibar

Derrek Paul Hibar Derrek Paul Hibar derrek.hibar@ini.usc.edu Obtain the ADNI Genetic Data Quality Control Procedures Missingness Testing for relatedness Minor allele frequency (MAF) Hardy-Weinberg Equilibrium (HWE) Testing

More information

Concepts and relevance of genome-wide association studies

Concepts and relevance of genome-wide association studies Science Progress (2016), 99(1), 59 67 Paper 1500149 doi:10.3184/003685016x14558068452913 Concepts and relevance of genome-wide association studies ANDREAS SCHERER and G. BRYCE CHRISTENSEN Dr Andreas Scherer

More information

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of

More information

Genetics and Bioinformatics

Genetics and Bioinformatics Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 3: Genome-wide Association Studies 1 Setting

More information

Whole-Genome Genetic Data Simulation Based on Mutation-Drift Equilibrium Model

Whole-Genome Genetic Data Simulation Based on Mutation-Drift Equilibrium Model 2012 4th International Conference on Computer Modeling and Simulation (ICCMS 2012) IPCSIT vol.22 (2012) (2012) IACSIT Press, Singapore Whole-Genome Genetic Data Simulation Based on Mutation-Drift Equilibrium

More information

A haplotype map of the human genome

A haplotype map of the human genome Vol 437 27 October 2005 doi:10.1038/nature04226 A haplotype map of the human genome The International HapMap Consortium* ARTICLES Inherited genetic variation has a critical but as yet largely uncharacterized

More information

MAPPING BY ADMIXTURE LINKAGE DISEQUILIBRIUM: ADVANCES, LIMITATIONS AND GUIDELINES

MAPPING BY ADMIXTURE LINKAGE DISEQUILIBRIUM: ADVANCES, LIMITATIONS AND GUIDELINES Nature Reviews Genetics AOP, published online 12 July 25; doi:1.138/nrg1657 REVIEWS MAPPING BY ADMIXTURE LINKAGE DISEQUILIBRIUM: ADVANCES, LIMITATIONS AND GUIDELINES Michael W. Smith* and Stephen J. O

More information

Population structure, heritability, and polygenic risk

Population structure, heritability, and polygenic risk Population structure, heritability, and polygenic risk Alicia Martin Daly Lab October 18, 2016 armartin@broadinstitute.org @genetisaur Project goals Call local ancestry in large case/control PTSD cohort

More information

SNPs - GWAS - eqtls. Sebastian Schmeier

SNPs - GWAS - eqtls. Sebastian Schmeier SNPs - GWAS - eqtls s.schmeier@gmail.com http://sschmeier.github.io/bioinf-workshop/ 17.08.2015 Overview Single nucleotide polymorphism (refresh) SNPs effect on genes (refresh) Genome-wide association

More information

Measures of human population structure show heterogeneity among genomic regions

Measures of human population structure show heterogeneity among genomic regions Measures of human population structure show heterogeneity among genomic regions Bruce S. Weir, Lon R. Cardon, Amy D. Anderson, et al. Genome Res. 2005 15: 1468-1476 Access the most recent version at doi:10.1101/gr.4398405

More information

Oral Cleft Targeted Sequencing Project

Oral Cleft Targeted Sequencing Project Oral Cleft Targeted Sequencing Project Oral Cleft Group January, 2013 Contents I Quality Control 3 1 Summary of Multi-Family vcf File, Jan. 11, 2013 3 2 Analysis Group Quality Control (Proposed Protocol)

More information

Principal Component Analysis in Genomic Data

Principal Component Analysis in Genomic Data Principal Component Analysis in Genomic Data Seunggeun Lee Department of Biostatistics University of North Carolina at Chapel Hill March 4, 2010 Seunggeun Lee (UNC-CH) PCA March 4, 2010 1 / 12 Bio Korean

More information

Cornell Probability Summer School 2006 Ancestral Recombination Graph

Cornell Probability Summer School 2006 Ancestral Recombination Graph Cornell Probability Summer School 200 Ancestral Recombination Graph Simon Tavaré Lecture 3 Why recombination? In the era of genomic polymorphism data, the need for models that include recombination is

More information

Single nucleotide polymorphisms (SNPs) are promising markers

Single nucleotide polymorphisms (SNPs) are promising markers A dynamic programming algorithm for haplotype partitioning Kui Zhang, Minghua Deng, Ting Chen, Michael S. Waterman, and Fengzhu Sun* Molecular and Computational Biology Program, Department of Biological

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Why can GBS be complicated? Tools for filtering, error correction and imputation. Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower

More information

Genome variation - part 1

Genome variation - part 1 Genome variation - part 1 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 21 th January 2016 Aims of the session Introduce major

More information

An introduction to genetics and molecular biology

An introduction to genetics and molecular biology An introduction to genetics and molecular biology Cavan Reilly September 5, 2017 Table of contents Introduction to biology Some molecular biology Gene expression Mendelian genetics Some more molecular

More information

Park /12. Yudin /19. Li /26. Song /9

Park /12. Yudin /19. Li /26. Song /9 Each student is responsible for (1) preparing the slides and (2) leading the discussion (from problems) related to his/her assigned sections. For uniformity, we will use a single Powerpoint template throughout.

More information

Computational Haplotype Analysis: An overview of computational methods in genetic variation study

Computational Haplotype Analysis: An overview of computational methods in genetic variation study Computational Haplotype Analysis: An overview of computational methods in genetic variation study Phil Hyoun Lee Advisor: Dr. Hagit Shatkay A depth paper submitted to the School of Computing conforming

More information

Genetics and Psychiatric Disorders Lecture 1: Introduction

Genetics and Psychiatric Disorders Lecture 1: Introduction Genetics and Psychiatric Disorders Lecture 1: Introduction Amanda J. Myers LABORATORY OF FUNCTIONAL NEUROGENOMICS All slides available @: http://labs.med.miami.edu/myers Click on courses First two links

More information

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs (3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable

More information

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies p. 1/20 Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies David J. Balding Centre

More information

"Genetics in geographically structured populations: defining, estimating and interpreting FST."

Genetics in geographically structured populations: defining, estimating and interpreting FST. University of Connecticut DigitalCommons@UConn EEB Articles Department of Ecology and Evolutionary Biology 9-1-2009 "Genetics in geographically structured populations: defining, estimating and interpreting

More information

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Genotype calling Genotyping methods for Affymetrix arrays Genotyping

More information

Population Genetics in the Genomic Era

Population Genetics in the Genomic Era Harvard-MIT Division of Health Sciences and Technology HST.512: Genomic Medicine Prof. Marco F. Ramoni Population Genetics in the Genomic Era Marco F. Ramoni Children s Hospital Informatics Program and

More information

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score Midterm 1 Results 10 Midterm 1 Akey/ Fields Median - 69 8 Number of Students 6 4 2 0 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 Exam Score Quick review of where we left off Parental type: the

More information

Characterization of Allele-Specific Copy Number in Tumor Genomes

Characterization of Allele-Specific Copy Number in Tumor Genomes Characterization of Allele-Specific Copy Number in Tumor Genomes Hao Chen 2 Haipeng Xing 1 Nancy R. Zhang 2 1 Department of Statistics Stonybrook University of New York 2 Department of Statistics Stanford

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: A pooling-based genome-wide analysis identifies new potential candidate genes for atopy in the European Community Respiratory Health Survey (ECRHS) Authors: Francesc

More information

Linking Genetic Variation to Important Phenotypes

Linking Genetic Variation to Important Phenotypes Linking Genetic Variation to Important Phenotypes BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under

More information

Efficient Algorithms for SNP Genotype Data Analysis using Hidden Markov Models of Haplotype Diversity

Efficient Algorithms for SNP Genotype Data Analysis using Hidden Markov Models of Haplotype Diversity Efficient Algorithms for SNP Genotype Data Analysis using Hidden Markov Models of Haplotype Diversity Justin Kennedy University of Connecticut, 2009 Advances in SNP genotyping technologies have played

More information

Title: Powerful SNP Set Analysis for Case-Control Genome Wide Association Studies. Running Title: Powerful SNP Set Analysis. Hill, NC. MD.

Title: Powerful SNP Set Analysis for Case-Control Genome Wide Association Studies. Running Title: Powerful SNP Set Analysis. Hill, NC. MD. Title: Powerful SNP Set Analysis for Case-Control Genome Wide Association Studies Running Title: Powerful SNP Set Analysis Michael C. Wu 1, Peter Kraft 2,3, Michael P. Epstein 4, Deanne M. Taylor 2, Stephen

More information

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations Topics How to track evolution allele frequencies Hardy Weinberg principle applications Requirements for genetic equilibrium Types of natural selection Population genetic polymorphism in populations, pp.

More information

ARTICLE. Ke Hao 1, Cheng Li 1,2, Carsten Rosenow 3 and Wing H Wong*,1,4

ARTICLE. Ke Hao 1, Cheng Li 1,2, Carsten Rosenow 3 and Wing H Wong*,1,4 (2004) 12, 1001 1006 & 2004 Nature Publishing Group All rights reserved 1018-4813/04 $30.00 www.nature.com/ejhg ARTICLE Detect and adjust for population stratification in population-based association study

More information

Data Sources and Biobanks in the Asia-Pacific Region. Wei Zhou, MD, Ph.D. Department of Epidemiology, Merck Research Laboratories October 23, 2014

Data Sources and Biobanks in the Asia-Pacific Region. Wei Zhou, MD, Ph.D. Department of Epidemiology, Merck Research Laboratories October 23, 2014 Data Sources and Biobanks in the Asia-Pacific Region Wei Zhou, MD, Ph.D. Department of Epidemiology, Merck Research Laboratories October 23, 2014 1 Disclosures Wei Zhou is currently an employee of Merck

More information

Human linkage analysis. fundamental concepts

Human linkage analysis. fundamental concepts Human linkage analysis fundamental concepts Genes and chromosomes Alelles of genes located on different chromosomes show independent assortment (Mendel s 2nd law) For 2 genes: 4 gamete classes with equal

More information

Quality Control Report for Exome Chip Data University of Michigan April, 2015

Quality Control Report for Exome Chip Data University of Michigan April, 2015 Quality Control Report for Exome Chip Data University of Michigan April, 2015 Project: Health and Retirement Study Support: U01AG009740 NIH Institute: NIA 1. Summary and recommendations for users A total

More information

REVIEWS GENOME-WIDE ASSOCIATION STUDIES FOR COMMON DISEASES AND COMPLEX TRAITS. Joel N. Hirschhorn* and Mark J. Daly*

REVIEWS GENOME-WIDE ASSOCIATION STUDIES FOR COMMON DISEASES AND COMPLEX TRAITS. Joel N. Hirschhorn* and Mark J. Daly* GENOME-WIDE ASSOCIATION STUDIES FOR COMMON DISEASES AND COMPLEX TRAITS Joel N. Hirschhorn* and Mark J. Daly* Abstract Genetic factors strongly affect susceptibility to common diseases and also influence

More information

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical. S SG ection ON tatistical enetics Metabolomics meets Genomics Hemant K. Tiwari, Ph.D. Professor and Head Section on Statistical Genetics Department of Biostatistics School of Public Health Metabolomics:

More information

Lecture 10: Introduction to Genetic Drift. September 28, 2012

Lecture 10: Introduction to Genetic Drift. September 28, 2012 Lecture 10: Introduction to Genetic Drift September 28, 2012 Announcements Exam to be returned Monday Mid-term course evaluation Class participation Office hours Last Time Transposable Elements Dominance

More information

Introduction to Genome Wide Association Studies 2015 Sydney Brenner Institute for Molecular Bioscience Shaun Aron

Introduction to Genome Wide Association Studies 2015 Sydney Brenner Institute for Molecular Bioscience Shaun Aron Introduction to Genome Wide Association Studies 2015 Sydney Brenner Institute for Molecular Bioscience Shaun Aron Many sources of technical bias in a genotyping experiment DNA sample quality and handling

More information

Update on the Genomics Data in the Health and Re4rement Study. Sharon Kardia Jennifer A. Smith University of Michigan April 2013

Update on the Genomics Data in the Health and Re4rement Study. Sharon Kardia Jennifer A. Smith University of Michigan April 2013 Update on the Genomics Data in the Health and Re4rement Study Sharon Kardia Jennifer A. Smith University of Michigan April 2013 Genetic variation in SNPs (Single Nucleotide Polymorphisms) ATTGCAATCCGTGG...ATCGAGCCA.TACGATTGCACGCCG

More information

An introductory overview of the current state of statistical genetics

An introductory overview of the current state of statistical genetics An introductory overview of the current state of statistical genetics p. 1/9 An introductory overview of the current state of statistical genetics Cavan Reilly Division of Biostatistics, University of

More information

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,

More information

Detecting selection on nucleotide polymorphisms

Detecting selection on nucleotide polymorphisms Detecting selection on nucleotide polymorphisms Introduction At this point, we ve refined the neutral theory quite a bit. Our understanding of how molecules evolve now recognizes that some substitutions

More information

Statistical Tests for Admixture Mapping with Case-Control and Cases-Only Data

Statistical Tests for Admixture Mapping with Case-Control and Cases-Only Data Am. J. Hum. Genet. 75:771 789, 2004 Statistical Tests for Admixture Mapping with Case-Control and Cases-Only Data Giovanni Montana and Jonathan K. Pritchard Department of Human Genetics, University of

More information

Sequencing Millions of Animals for Genomic Selection 2.0

Sequencing Millions of Animals for Genomic Selection 2.0 Proceedings, 10 th World Congress of Genetics Applied to Livestock Production Sequencing Millions of Animals for Genomic Selection 2.0 J.M. Hickey 1, G. Gorjanc 1, M.A. Cleveland 2, A. Kranis 1,3, J. Jenko

More information

ARTICLE Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering

ARTICLE Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering ARTICLE Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering Sharon R. Browning * and Brian L. Browning * Whole-genome

More information

Random Allelic Variation

Random Allelic Variation Random Allelic Variation AKA Genetic Drift Genetic Drift a non-adaptive mechanism of evolution (therefore, a theory of evolution) that sometimes operates simultaneously with others, such as natural selection

More information

Fine-scale mapping of meiotic recombination in Asians

Fine-scale mapping of meiotic recombination in Asians Fine-scale mapping of meiotic recombination in Asians Supplementary notes X chromosome hidden Markov models Concordance analysis Alternative simulations Algorithm comparison Mongolian exome sequencing

More information

Chapter 11: Genome-Wide Association Studies

Chapter 11: Genome-Wide Association Studies Education Chapter 11: Genome-Wide Association Studies William S. Bush 1 *, Jason H. Moore 2 1 Department of Biomedical Informatics, Center for Human Genetics Research, Vanderbilt University Medical School,

More information

Linkage Disequilibrium Mapping via Cladistic Analysis of Single-Nucleotide Polymorphism Haplotypes

Linkage Disequilibrium Mapping via Cladistic Analysis of Single-Nucleotide Polymorphism Haplotypes Am. J. Hum. Genet. 75:35 43, 2004 Linkage Disequilibrium Mapping via Cladistic Analysis of Single-Nucleotide Polymorphism Haplotypes Caroline Durrant, 1 Krina T. Zondervan, 1 Lon R. Cardon, 1 Sarah Hunt,

More information

Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic Dataset from Two Human Populations

Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic Dataset from Two Human Populations Nova Southeastern University NSUWorks Biology Faculty Articles Department of Biological Sciences 3-5-2008 Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Structural variation Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Genetic variation How much genetic variation is there between individuals? What type of variants

More information

GENOME-WIDE ASSOCIATION STUDIES: THEORETICAL AND PRACTICAL CONCERNS

GENOME-WIDE ASSOCIATION STUDIES: THEORETICAL AND PRACTICAL CONCERNS GENOME-WIDE ASSOCIATION STUDIES: THEORETICAL AND PRACTICAL CONCERNS William Y. S. Wang*,Bryan J. Barratt*,David G. Clayton* and John A. Todd* Abstract To fully understand the allelic variation that underlies

More information

ACCEPTED. Victoria J. Wright Corresponding author.

ACCEPTED. Victoria J. Wright Corresponding author. The Pediatric Infectious Disease Journal Publish Ahead of Print DOI: 10.1097/INF.0000000000001183 Genome-wide association studies in infectious diseases Eleanor G. Seaby 1, Victoria J. Wright 1, Michael

More information

Measurement of Molecular Genetic Variation. Forces Creating Genetic Variation. Mutation: Nucleotide Substitutions

Measurement of Molecular Genetic Variation. Forces Creating Genetic Variation. Mutation: Nucleotide Substitutions Measurement of Molecular Genetic Variation Genetic Variation Is The Necessary Prerequisite For All Evolution And For Studying All The Major Problem Areas In Molecular Evolution. How We Score And Measure

More information

Data Mining and Applications in Genomics

Data Mining and Applications in Genomics Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications

More information

MoGUL: Detecting Common Insertions and Deletions in a Population

MoGUL: Detecting Common Insertions and Deletions in a Population MoGUL: Detecting Common Insertions and Deletions in a Population Seunghak Lee 1,2, Eric Xing 2, and Michael Brudno 1,3, 1 Department of Computer Science, University of Toronto, Canada 2 School of Computer

More information

Conifer Translational Genomics Network Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural Project Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 3 Population Genetics Nicholas Wheeler & David Harry Oregon

More information