Take Home Message. Molecular Imaging Genomics. How to do Genetics. Questions for the Study of. Two Common Methods for Gene Localization

Size: px
Start display at page:

Download "Take Home Message. Molecular Imaging Genomics. How to do Genetics. Questions for the Study of. Two Common Methods for Gene Localization"

Transcription

1 Take Home Message Molecular Imaging Genomics David C. Glahn, PhD Olin Neuropsychiatry Research Center & Department of Psychiatry, Yale University This may be the most important thing I say in this lecture. There is no one size fits all in genetic analysis of complex traits. Laura Almasy July 22, 2010 How to do Genetics Questions for the Study of Question What do you want to know? Sample Who do you need to study? Method How will you use your data? 1) Is this trait influenced by genetic factors? How strong are these genetic influences? 2) Which traits are influenced by the same genes? 3) Where are the genes that influence a trait? 4) What are the specific genes that influence the trait? 5) What specific genetic variants influence the trait and how do they interact with each other and with the environment? Question 3: Localization Where are the genes that influence a trait? Two Common Methods for Gene Localization Linkage analyses: test for co-segregation of phenotype and genotype within families - a function of physical connections of genes on chromosomes Association analyses: test for deviations of phenotype-genotype combinations from that predicted by their separate frequencies - a function of linkage created by population history

2 Association Studies What is Association? Pro: Can be done with unrelated individuals. Statistical methods easy and fast. May be able to detect loci of smaller effect with a given sample size. Con: Requires, which may not be present, making power difficult to estimate. Susceptible to allelic heterogeneity. Tests for correlation between genotype and phenotype Association analyses work when: 1) your genotyped marker is a functional polymorphism 2) your genotyped marker is in with a functional polymorphism Linkage Disequilibrium (LD) How do we get LD? Linkage is the non-random association of alleles at two or more loci LD = the population frequency of allelic combinations the expected combinations from random formation of haplotypes Level of LD is influenced by a number of factors including genetic linkage, selection, the rate of recombination, the rate of mutation, genetic drift, non-random mating, and population structure. LD is unpredictable A 1 B 1 C 1 a mutation occurs A 1 B 1 B 2 C 1 Complete (by D, but not r-squared) How do we get LD? Complete A 1 B 1 B 2 C 1 A recombination occurs 1 A 1 A 2 B 1 B 2 C 1 Incomplete B 2 How do we get LD? Incomplete A 1 A 2 B 1 B 2 C 1 time passes, more recombination occurs Equilibrium the haplotype frequencies are the product of the allele frequencies p(b2) = p(b) p(2)

3 How do we get LD? A 1 B 1 C 1 a mutation occurs A 1 B 1 B 2 C 1 Complete recombination occurs A 1 A 2 B 1 B 2 C 1 Incomplete time passes, more recombination occurs Equilibrium Association test for unrelated individuals: Quantitative traits Genotype N mean variance n µ 2! 2 n µ! n µ d = µ " µ 2! +! 2 n Assuming the trait values are normally distributed n 2! Association test for unrelated Transmission test A B Aff a c Unaff # 2 = (a+b+c+d)(ad-bc)2 (a+b)(c+d)(a+c)(b+d) b d?? Transmitted allele A B Nontransmitted allele A B a=0 b c d=0 # 2 = (b-c) 2 /(b+c) Do you feel lucky? Memory Activation & APOE $4 Risk gene for Alzheimer s 16 APOE $4 14 APOE $3 ~3 Billion base pairs Bookheimer et al., N Engl J Med, 2000

4 Fear Response & Serotonin What if you don t know the allele? A genome-wide association (GWA) study is an approach that involves rapidly scanning markers across the complete sets of DNA, or genomes, of many people to find genetic variations associated with a trait. Currently: 700K to 1.5Mil SNPs Hariri et al., Science, 2002 Association, LD & Haplotypes LD Pattern for FVII Gene Unlikely that functional variants are typed Association studies are dependent on LD between a genotyped marker and a functional variant. May change with complete sequencing Multiple testing: p-values Multiple testing A p-value of 0.05 implies that 5% of the time we will reject the null hypothesis (i.e. conclude that we have an association) when the null hypothesis is actually correct If we test 100 SNPs and each time we use a p- value of 0.05 as our cutoff for significance, we would expect 5 of those SNPs to be significant (p < 0.05) just by chance The simplest correction is the Bonferroni: multiply each p-value by the total number of tests, or divide the significance threshold required by the number of tests (0.05 / #). For 550K SNPs, requires p=1.2x10-7 This maintains an experiment-wide significance threshold, but may be too conservative when the tests are correlated, as they might be if some of the markers tested are in LD with each other.

5 Temporal Lobe GWA Cortical Thickness GWA n=742 rs p=1.3x10-7 rs P=3.1x10-7 Association found at rs (p=4.0x10-10), chr 13q31, near SLIT- & NTRK-like family, member 6. SLITRKs are expressed predominantly in neural tissues and have neurite-modulating activity. Associated with Tourette's Syndrome Stein et al., NeuroImage, 2010 Glahn et al., ACNP, 2009 Stein et al., NeuroImage, 2010 VBM & GWA Limitations of Association A QTL may be in equilibrium with the other polymorphisms surrounding it. Disequilibrium need not be present. Since LD need not be present, negative association results have implications only for the marker you have tested, lack of association does not exclude the gene or region. Population Stratification: If the sample contains multiple populations that differ in the trait of interest, any locus whose allele frequencies differ between the populations will show association Example: Hypertension Example: Hypertension African Americans 70% A, 30% B Affected 64% A, 36% B European Americans 50% A, 50% B Unaffected 56% A, 44% B

6 Minimizing Limitations of Linkage Studies 1. Match cases and controls carefully or try to obtain subjects from a single well defined population. 2. Use one of a variety of statistical approaches designed to deal with population stratification (e.g. TDT, genomic control) Pro: Power to find a gene can be more easily quantified. Guaranteed to work if the sample size is large enough. Not influenced by allelic heterogeneity. Con: Need a large sample of related individuals. Large enough may be too large to be practical. Genetic Linkage Defined Measuring Linkage: Lod Score Genetic loci that are physically close to one another tend to stay together during meiosis. Independent assortment occurs when the genes on different chromosomes are separated by a great enough distance on the same chromosome that recombination occurs at least half of the time. An exception to independent assortment develops when genes appear near one another on the same chromosome. When genes occur on the same chromosome, they are usually inherited as a single unit. Genes inherited in this way are said to be linked, and are referred to as "linkage groups. LOD = log 10 (probability of birth sequence with a given linkage value/probability of birth sequence with no linkage) LOD = log 10 ((1-!) NR x! R )/0.5 NR+R NR denotes the number of non-recombinant offspring, R denotes the number of recombinant offspring. Theta = recombinant fraction= R / (NR + R) A LOD score >=3.0 is considered evidence for linkage A LOD score of 3 indicates 1000 to 1 odds that the linkage being observed did not occur by chance A LOD score <=-2.0 is considered evidence to exclude linkage Linkage with White-Matter Linkage with White-Matter Tracts LOD = 2.36 Chr 15q25 Bivariate linkage for subcortical & ependymal HWM volumes (log of odds=2.12) on chr 1 at 288 cm. Kochunov et al., Stroke, family members Kochunov et al., Stroke, family members

7 Linkage vs. Association Determining Linkage Power Association: you re testing for an excess of a specific combination of alleles at two loci. The same alleles must be traveling together at a population level. Detects effects of common variants. Linkage: you re testing for an excess of the parental type. That parental type (i.e. the alleles traveling together) could be different in every family (i.e. linkage equilibrium) and you would still get linkage. Can detect cumulative effect of multiple variants (including rare variants). The power to map a QTL in a human linkage study is a function of: 1. locus-specific heritability (genetic signalto-noise ratio) 2. Sample size 3. Pedigree size and complexity Sample size required for 80% power to detect linkage to a QTL at a LOD of 3 Determining Association Power Number of Individuals 100,000 10,000 1, Heritability due to QTL Pedigree Sibship (2) Sibship (4) The power to find association in a human study is a function of: 1. QTN-specific heritability (not QTL) 2. r 2 between the QTN and a genotyped marker 3. Sample size Combined Linkage/Association Analysis Heritability of Caudate Volume Best of both worlds QTL localization approach Linkage can detect cumulative effect of multiple variants (including rare variants). Association detects effects of common variants. Joint test of linkage/association more powerful than association alone when there is linkage. Only minor loss of power in the absence of linkage. Implemented in SOLAR Glahn et al., ASHG, 2009 Glahn et al., ASHG, 2009 Adjusted for covariates: Age, Age 2, Sex, Sex*Age, Sex*Age 2 h 2 =0.685 p=2.3"10-10 Combined left & right

8 SNPs Influencing Caudate Volume Candidate Gene: PRKAR1B PRKAR1B LRRTM4 DYNC1I1 7p22 (rs ) Protein kinase, campdependent, regulatory, type I, beta Nominal p-value = 2.3"10-8 Involved in brain development Associated with poor memory Glahn et al., ASHG, 2009 Solberg et al., Genomics, 1992 Joint Linkage/Association vs. Approaches to Genotyping Joint Linkage/Association LRRTM4 Association Candidate genes: genotype only markers in genes potentially related to the trait. Pro: fast and easy, may be able to be more thorough with a higher density of markers Con: must get lucky in choice of genes, lower potential for something really novel Genome screen: genotype anonymous markers spanning the genome at regular intervals Pro: can identify previously unknown genes, covers all of the possibilities Con: slower and more expensive, may have lower marker density which could translate to less power Glahn et al., ASHG, 2009 Question 4: Identification What specific genes influence the trait? Identifying a Causal Gene Once a significant QTL is identified, additional genetic tests are needed to determine the exact identity of the gene Association: identifies a genomic region of ~500kb (250kb to either side of the association) determined by the general extent of linkage Linkage: detect the cumulative additive genetic signal of all functional variants within a much larger genomic region (e.g Mb)

9 VPS13A: A Candidate Gene for Association Analysis: VPS13A SNPs and Frontal Lobe Volume VPS13A is a 73 exon gene localizing to chromosome 9q21! Involved in choreaacanthocytosis! Schizophrenia! OCD! Frontal lobe functioning rs rs amino acid protein Bellis et al., ASHB, 2009 Bellis et al., ASHB, 2009 Question 5: Characterization Gene x Environment Interaction What specific genetic variants influence the trait and how do they interact with each other and with the environment? Q Non-smokers DD Dd dd Smokers The effect of a particular genotype differs in different environments OR the effect of the environment depends on genotype Epistasis: Gene x Gene Interaction Coffee Break Q Bb bb 0 CC Cc cc The effect of a genotype at one locus depends on the genotype you have at another locus.

10 Association Notation Two bi-allelic markers: Locus 1: A a Locus 2: B b Allele frequencies: P A, P a, P B, P b Haplotype frequencies: P, P Ab, P ab, P ab, Estimate Association: Lewontin s D D = P P A x P B = P ab P a x P b = -P Ab P A x P b = -P ab P a x P B = P x P ab P Ab x P ab Limitations of D r 2 as a measure of LD D lacks sensitivity to distinguish different degrees of LD. D =1 if at least one haplotype is missing. Considerable upward bias for small to moderate sample sizes. Take extreme values when at least one allele frequency is small. Measures statistical association between genotypes Inverse relationship between r 2 and sample size required to detect indirect association. Pritchard and Przeworski For low allele frequencies, r 2 has more reliable sample properties than D. Mendel s Pea Plants Experimental cross: Round green % Wrinkled yellow Seed Surface Seed Color P1 RRGG % rrgg Phenotyp e Genotyp e Round Wrinkled Green Yellow RR or Rr Rr GG or Gg gg Gametes RG F1 RrGg % RrGg Gametes RG Rg rg rg rg

11 Mendel s Law of Independent More Pea Plants (Bateson and Punnett) RG Rg rg rg RG RRGG RRGg RrGG RrGg Rg RRGg RRgg RrGg Rrgg rg RrGG RrGr rrgg rrgg rg RrGg Rrgg rrgr rrgg Round Green Round Yellow Wrinkled Green Wrinkled Yellow Exp Obs Exp Obs Purple Long Purple Round Red Long Red Round Alleles of different genes assort independently of one another during gamete formation. Different traits are inherited independently of each other, unless genes are linked. Flower color: Pollen shape: Purple or Red Long or Round Recombination - a physical entity Morgan s Flies gl = glass eye eb = ebony body + = wild type Germ Cell P1 gl eb Fem. x ++ Male gl eb + + gl eb Egg or Sperm F1 x + + gl eb Males Recombination - a statistical % + eb % gl % gl eb % Distances cm: genetic distances, based on recombination bp: basepairs, physical distance, Parental = 92% Recombinant = 8% & = 0.08