SNP Selection. Outline of Tutorial. Why Do We Need tagsnps? Concepts of tagsnps. LD and haplotype definitions. Haplotype blocks and definitions

Size: px
Start display at page:

Download "SNP Selection. Outline of Tutorial. Why Do We Need tagsnps? Concepts of tagsnps. LD and haplotype definitions. Haplotype blocks and definitions"

Transcription

1 SNP Selection Outline of Tutorial Concepts of tagsnps University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for Human Genetics Research LD and haplotype definitions Haplotype blocks and definitions Tools to identify tagsnps Why Do We Need tagsnps? Ex: E2F2 SNP Genotypes Are Correlated (aka linkage disequilibrium) the nonindependence of alleles at different sites. Pritchard and Przeworski 2001 Genotype at one site can predict genotype at another site Too Many SNPs to Genotype! Whole Genome: Average Gene: 26.5 kb 15,000,000 SNPs 130 SNPs 6,000,000 SNPs > 5% MAF 44 SNPs 5% MAF Proportion of genotypes are correlated 1

2 Measuring Pair-wise SNP Correlations SNP genotype correlation described by linkage disequilibrium (LD) Pair-wise measures of LD: D and r 2 D = p AB - p A p B ; D = D/D max Recombination LD Statistics: Practical Uses r 2 is inversely related to power ( effective sample size ) 1/r 2 1,000 cases 1,250 cases 1,000 controls r 2 =1.0 1,250 controls r 2 = 0.80 r 2 = D 2 f(a 1 )f(a 2 )f(b 1 )f(b 2 ) Power D is related to recombination history D = 1 no recombination D < 1 historical recombination Where to Find Population LD Statistics Where to Find Population LD Statistics For your gene or region of interest, search For your gene or region of interest, search HapMap HapMap Perlegen genome.perlegen.com Perlegen genome.perlegen.com SeattleSNPs PGA pga.gs.washington.edu SeattleSNPs PGA pga.gs.washington.edu NIEHS SNPs egp.gs.washington.edu NIEHS SNPs egp.gs.washington.edu 2

3 Visualizing Pair-wise LD Visualizing Pair-wise LD Visualizing Pair-wise LD Where to Find Population LD Statistics For your gene or region of interest, search HapMap Genome Variation Server Perlegen genome.perlegen.com SeattleSNPs PGA NIEHS SNPs pga.gs.washington.edu egp.gs.washington.edu 3

4 Visualizing Pair-wise LD Visualizing Pair-wise LD Visualizing Pair-wise LD Visualizing Pair-wise LD 4

5 Visualizing Pair-wise LD Visualizing Pair-wise LD Visualizing Pair-wise LD Visualizing Pair-wise LD 5

6 Visualizing Pair-wise LD Multi-SNP Genotype Correlations (aka Haplotypes) a unique combination of genetic markers present in a chromosome. pg 57 in Hartl & Clark, 1997 Constructing Haplotypes Constructing Haplotypes T T G G Collect pedigrees T/T, G/G C/C, A/G C/T, A/G C/C, A/G C/T, A/A C T A G C C A G Allele-specific PCR Somatic cell hybrids Human Rodent Hybrid Examples of Haplotype Inference Software: EM Algorithm Haploview Arlequin PHASE v2.1 SNP 1 SNP 2 C/T A/G HAPLOTYPER 6

7 Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs >625 genes re-sequenced Cell cycle, DNA repair/replication, apoptosis 2 DNA panels 1: Polymorphism Discovery Resource (PDR90) 2: Europeans, Africans, Hispanics, and Asians PHASEv2.0 results posted on website Interactive tool (VH1) to visualize and sort haplotypes Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs 7

8 Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs 8

9 Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs Haplotypes in NIEHS SNPs 9

10 Haplotypes in NIEHS SNPs Using LD and Haplotypes to Pick tagsnps r 2 is inversely related to power ( effective sample size ) 1/r 2 1,000 cases 1,250 cases 1,000 controls r 2 =1.0 1,250 controls r 2 = 0.80 Example: Tagger and LDSelect D is related to recombination history D = 1 no recombination D < 1 historical recombination Example: Haplotype blocks Using LD and Haplotypes to Pick tagsnps r 2 is inversely related to power ( effective sample size ) 1/r 2 LDSelect: Using LD to Pick tagsnps LDSelect Uses SNP discovery data (not haplotypes) Finds all correlated SNP genotypes to minimize the total number Maintains genetic diversity of locus 1,000 cases 1,250 cases 1,000 controls r 2 =1.0 1,250 controls r 2 = 0.80 Example: Tagger and LDSelect Discovery genotype data pair-wise LD pick tagsnps Carlson et al. AJHG (2004) 10

11 TagSNPs Are Population Specific European-descent (BLM) BLM SNP Selection: tagsnp Data African-descent (BLM) Side Note: Categorizing tagsnps SNP context Nonrepetitive > repetitive Location of SNP Coding > noncoding Function Nonsynonymous > synonymous LPO Categorizing tagsnps 11

12 Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagsnps Genotype samples Pick tagsnps Infer haplotypes Test for association Haplotypes Pick tagsnps Genotype samples Recombination Natural selection Haplotype block definition Population history Population demography Pick tagsnps Infer haplotypes Test for association Haplotype Blocks Block Definitions Daly et et al al Nat. 2001Genet. (2001) Daly et et al al Nat. 2001Genet. (2001) Strong LD Few Haplotypes Represent most chromosomes D [Gabriel et al Science (2002)] 12

13 Block Definitions Haplotype Blocks and tagsnps Four-gamete test: Identifying blocks and tagsnps: A a B b A a A a B b b B Manually Visual haplotype <4 haplotypes, D =1 block 4 haplotypes, D <1 boundary Algorithms HapMap and Haploview Haplotype Blocks and tagsnps Haplotype Blocks and tagsnps tagsnps Identifying blocks and tagsnps: Manually Visual Haplotype Algorithms HapMap and HaploView LTA: 16 SNPs (MAF >10%) 6 common haplotypes 13

14 HapMap Data and Haploview HapMap Data and Haploview 14

15 Import HapMap Data into Haploview 15

16 16

17 17 Note: HapMap is not complete variation data

18 Variation data, LD, and tagsnps for ANAPC10 in European-Americans tagsnps and Genome Variation Server HapMap 5 tagsnps NIEHS SNPs 12 tagsnps Note: Tagger is essentially the same as LDSelect 18

19 Haplotypes, TagSNPs, and Caveats Haplotypes are inferred Block-like structure assumed for some software Different block definitions Block boundaries sensitive to marker density Genotype savings may not be great (recombination) tagsnps based on LD more popular than htsnps SNP Selection Summary Resources available for pair-wise LD and haplotypes Software for tagsnp selection available Be aware the limitations of the approach you choose Be aware that some SNP datasets may not represent all common variation of gene or gene region Be aware that a fraction of tagsnps do not convert into a successful genotyping assay 19