Heritable Diseases. Lecture 2 Linkage Analysis. Genetic Markers. Simple Assumed Example. Definitions. Genetic Distance

Size: px
Start display at page:

Download "Heritable Diseases. Lecture 2 Linkage Analysis. Genetic Markers. Simple Assumed Example. Definitions. Genetic Distance"

Transcription

1 Lecture 2 Linkage Analysis Jurg Ott Heritable Diseases Diseases may run in families why? Infections can be passed from one family member to another Genes also run in families Genetic Markers Loci that are polymorphic (two or more alleles), inherited in a mendelian manner. Sign posts in genetic mapping for localizing new genes. Definition: The most common allele has frequency < 0.95 (or < 0.99). Up to 60 classical markers: Enzyme polymorphisms, blood groups, etc. DNA polymorphisms Microsatellites, highly polymorphic, mutable SNPs, stable differences in DNA sequence, >0 mio SNPs discovered Simple Assumed Example AA BB B b B b b b 3 4 aa bb b b aa bb B b N N R R A B a b Locus Locus 2 B b b B N N R R crossing over, or crossover Definitions Recombination alleles at different loci have different grandparental origin. Recombination fraction = proportion of recombinants = probability for recombination to occur. Crossovers cannot be observed directly, only their phenotypic expression as recombinations. Multiple crossover points on a gamete: o Odd number recombination o Even number no recombination Genetic Distance Crossovers occur randomly on a chromosome (not necessarily uniformly distributed). Genetic distance (map distance) between two points = expected number of crossovers between them on a gamete. Unit of measurement = Morgan (M) = 00 cm. Chiasma interference, no chromatid interference Different crossover frequencies in females and males: Genetic distances are sex (age?) specific. Example: = 0.03 x = 0.03 M = 3 cm 5 6

2 Map Functions Correlate recombination fraction,, with genetic distance, x. Morgan s map function: = x (good for small distances) Physical and Genetic Lengths of Human Autosomes Matise et al (2007) Genome Res 7: recombination fraction Haldane map function: Crossovers occur independently of each other. Genetic length Physical length map distance 8 Recombination Intensity Tapper et al (2005) PNAS 02, Chrom. 9: Recombination rates vary along chromosome and by sex Recombination Fraction and Age Haldane & Crew (925) Nature 5 (2896):S2 Offspring of phase-known matings in poultry 5 cocks, doubly heterozygous BS/bs for two sexlinked mendelian loci, mated with bs hens. The four possible offspring types all distinguishable phenotypically. Total of 648 chicks. Breeding year 2 3 Recomb. fraction Recombination fraction progressively larger with advancing age of the cocks. 9 0 How Do We Localize Genes for Heritable Diseases? Collect families with affected individuals Draw blood and extract DNA Determine genotypes for ~400 00,000s of markers along genome. Track inheritance of marker alleles and trait in pedigrees: Linkage analysis. Counting Recombinants Bird TD, Ott J, Giblett ER (982) Am J Hum Genet 34: Dominantly inherited disease CMT Duffy locus (FY) on chromosome, alleles nd b 2 2

3 Problems Penetrance Age at onset Penetrance incomplete Phenocopies Parents unavailable Individuals not consenting to study Cannot generally count recombinants and nonrecomb. Solution: Estimate recombination fraction by maximum likelihood method Likelihood Likelihood = probability of data. Depends on unknown parameters: L( ) = P(data; ) Phase known double back-cross: L( ) = k ( ) n-k k = number of recombinants, n = total number of meioses. Phase unknown double backcross: L( ) = k ( ) n-k + n-k ( ) k R A 2 or 2 R a A a N N 3 4 Lod Score Lod score = scaled log likelihood ratio, Z( ) = log 0 [L( )/L( = ½)], = trial value for recombination fraction With linkage, lod score tends to increase. Maximum lod score 3 significant linkage. 5 meioses: recombinant and 4 nonrecombinants (given phase) lod score recombination fraction Phase known: Estimate = 0.20, lod score = 0.49 Phase unknown: Estimate = 0.2, lod score = Confidence interval for θ 0 Distinguish true recombination fraction, θ 0, and parameter θ in expressions for likelihood. For known recombination counts, may use exact procedures to compute CI, based on binomial distribution. Binom program: The -lod-down confidence interval Conneally PM, Edwards JH, Kidd KK, Lalouel JM, Morton NE, Ott J, White R (985). Report of the Committee on Methods of Linkage Analysis and Reporting. Cytogenet Cell Genet 40(-4): Constructed based on lod score curve Represents an approx. 95% CI 7 8 3

4 Lod score calculated by hand Morton (956) Am J Hum Genet 8, c = recombination fraction LIPED First generally available linkage program for large pedigrees Lod score tables published Z = log 0 220/3968 {80c(-c) c(-c) c(-c) c(-c) c 3 (-c) c 3 (-c) c 3 (-c) c 3 (-c) c 4 (-c) c 4 (-c) c 5 (-c) c 5 (-c) c 5 (-c) c 5 (- c) c 6 (-c) c 6 (-c) c 6 (-c) c 6 (-c) + 90c 7 (-c) c 7 (-c) c 7 (- c) c 8 (-c) c 8 (-c) + 80c 8 (-c) c 8 (-c) 9 + 4c 8 (-c) c 9 (-c) + 522c 9 (-c) c 9 (-c) c 9 (-c) 8 + 0c 9 (-c) c 0 (-c) c 0 (-c) c 0 (-c) c 0 (-c) 7 + 8c 0 (-c) 4 + 4c 0 (-c) c (-c) c (-c) c (-c) c (-c) c (-c) c 2 (-c) c 2 (-c) c 2 (-c) 5 + 8c 2 (-c) c 3 (-c) c 3 (-c) c 3 (-c) c 3 (-c) c 4 (-c) c 4 (-c) c 4 (-c) 4 + 6c 4 (-c) c 5 (-c) c 5 (-c) c 5 (-c) c 6 (-c) c 6 (-c) c 7 (-c) 3 } Development of LIPED based on Elston-Stewart (97) algorithm (Hum Hered 2: ). Recursive calculation of pedigree likelihood. John Edwards: unnatural method 9 20 Computer Programs for Linkage Analysis LIPED (Ott 974, 976), 2-point analysis PAP (Hasstedt 982) LINKAGE (Lathrop et al. 986). FastLINK Mapmaker (Lander et al. 987) CRI-MAP (Phil Green) Mendel (Lange et al. 988) Vitesse (O Connell and Weeks 995) Genehunter (Kruglyak et al. 996, Kong and Cox 997); Aspex (Risch); Loki (Heath 997); SAGE (Elston) Allegro (Gudbjartsson et al. 2000) Merlin (Abecassis) Simwalk2 (D. Weeks) Genetic Maps Botstein D, While RL, Skolnick M, Davis RW (980) Am J Hum Genet 32, 34-3 We describe a new basis for the construction of a genetic linkage map of the human genome. The basic principle of the mapping scheme is to develop, by recombinant DNA techniques, random single-copy DNA probes capable of detecting DNA sequence polymorphisms, when hybridized to restriction digests of an individual's DNA. Each of these probes will define a locus. Loci can be expanded or contracted to include more or less polymorphism by further application of recombinant DNA technology. Suitably polymorphic loci can be tested for linkage relationships in human pedigrees by established methods; and loci can be arranged into linkage groups to form a true genetic map of "DNA marker loci." Pedigrees in which inherited traits are known to be segregating can then be analyzed, making possible the mapping of the gene(s) responsible for the trait with respect to the DNA marker loci, without requiring direct access to a specified gene's DNA. For inherited diseases mapped in this way, linked DNA marker loci can be used predictively for genetic counseling. 50 markers will be sufficient 2 22 Genetic Maps 2 Genetic Maps 3 Donis-Keller H et al (987) A genetic linkage map of the human genome. Cell 5(2): We report the construction of a linkage map of the human genome, based on the pattern of inheritance of 403 polymorphic loci, including 393 RFLPs, in a panel of DNAs from 2 three-generation families. By a combination of mathematical linkage analysis and physical localization of selected clones, it was possible to arrange these loci into linkage groups representing 23 human chromosomes. We estimate that the linkage map is detectably linked to at least 95% of the DNA in the human genome. Dausset J, Cann H, Cohen D, Lathrop M, Lalouel JM, White R (990) Centre d Etude du Polymorphisme Humain (CEPH): Collaborative Genetic Mapping of the Human Genome. Genomics 6, The Centre d Etude du Polymorphisme Humain (CEPH) is a nonprofit research institute that makes available to the scientific community a valuable research resource. CEPH is committed to () make available to the scientific community DNA samples from a panel of reference families for the determination of genotypes for various DNA polymorphisms which may be used for the construction of the genetic map of the human genome and for other research areas dependent on access to such a common set of families, and (2) provide to the contributors of genotypes a compilation of all data that accumulate on the panel of families

5 Genome screen for Alzheimer Disease Liu et al (2007) Am J Hum Genet 8:7-3 Sequential likelihood ratio test of = 0.5 vs. = 0.2 Morton (955) Am J Hum Genet 7, Accept linkage when combined lod score, Z(0.2) 3 Reject linkage when Z(0.2) < 2 ( excluded values) Otherwise, continue sampling Today, apply LR test of = 0.5 vs. < Current: LR Test for Linkage Test statistic: / (θ); 2ln(LR) = χ 2 Genome-wide, asymptotic threshold for significance level of 5%: Lod score analysis: 3.3 Sib-pair allele sharing: 3.6 Lander & Kruglyak (995) Nat Genet, 24-7 Morton: They assume no linkage Locus Heterogeneity Morton (956) Am J Hum Genet 8, 80 Each family has its own different recombination fraction. i = recombination fraction in i-th family Easy likelihood ratio test: Zmax, i Zmax, total i n df, n = number of families 28 Locus Heterogeneity Smith (963) Ann Hum Genet 27, 75 Realistically, only 2 recombination fractions, < 0.50 and 0 = 0.50 some families with linkage and others without linkage. Mixture of these two family types. Solution: Estimate 2 parameters: = proportion of families with linkage = recombination fraction in linked families Computer program: HOMOG (Ott; Lap-Chee Tsui) Heterogeneity: Osteogenesis Imperfectnd GC Blood Types Vogel and Motulsky (986) Human Genetics. Springer Female rec. fraction, f m

6 Identity by descent (IBD) Alleles shared IBD: Copies of ancestral allele 2/5 3/4 2/5 3/5 5/5 3/4 Affected sibpairs (ASPs) 2 or 2 tt c = recombination fraction Probability of offspring genotypes: [c 2 + ( c) 2 ]/2 5/3 5/4 5/3 5/2 5/3 5/4 2 2 [c 2 + ( c) 2 ]/2 2 c( c) IBD IBS (not IBD)? 2 c( c) 3 32 Affected sibpairs (ASPs) Allele sharing: Tests 2 or tt c = recombination fraction Probability of offspring genotypes: [c 2 + ( c) 2 ]/2 [c 2 + ( c) 2 ]/2 c( c) c( c) Share allele from given parent Do not share allele from given parent Per parent. Proportion of parents transmitting same allele, S = c 2 + ( c) 2, ½ S. H 0 : S = ½. Per sibship. H 0 : proportion of sibships sharing 0,, and 2 alleles = ½, ¼, ½, respectively. Test for S > ½ carried out for any disease. Extension to other relatives: Whittemore statistic, implemented in Genehunter Equivalence with recessive inheritance Knapp et al (994) Hum Hered 44, 44-5 ASP analysis completely equivalent with lod score analysis under recessive inheritance, full penetrance, parents of unknown phenotype Elegantly allows for multiple affected offspring. No need for analysis of all pairs and complicated weighting schemes. Quantitative phenotypes Mean depends on genotype tt, TT Dominant example: Only TT genotypes have elevated mean levels Test whether means are different for different genotypes ANOVA (association) Linkage analysis in families

7 Age of disease onset Assume normal distribution for a = onset age. Often, f(a) unknown. Penetrances in breast cancer Easton et al (993) Am J Hum Genet 52: Use A = current age. P(affected by age A risk), F(A) = P(a A at risk) cumulative, sigmoid curve Implementation is complicated, particularly when penetrance is incomplete at high age Linkage between QTL and marker Haseman & Elston (972) Behav Genet 2, 3-9 Regress the square of the difference between sib-pair trait values on the estimated proportion of marker alleles that the sib pair shares IBD. Various extensions published 39 7