Genetic maps. from my past, present and future from my. Karl W Broman

Size: px
Start display at page:

Download "Genetic maps. from my past, present and future from my. Karl W Broman"

Transcription

1 Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics School of Medicine and Public Health University of Wisconsin Madison

2 Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics School of Medicine and Public Health University of Wisconsin Madison

3 cono Eucalypt genetic map m 0.0 ;I se 0.0 '" '" 1 1l '",IU,.."", ' " g427 " ",,.,.0'" ;na., '.0,0029 C422 "1 "...., ". C184 '" '" \2.3 92) ,-, " '" 1 '.2 gogba, 474 1!I.I a.,!isb 17,!I ".' 1'.1,, 19.3 " c187 2 l. i 9' 2 I F,,06;8 20. \, 'H U14..J ", c16!ic,, 2B. ' "' goj2.\, eu'... " H 21.1., HII-l "" 32.8 J' H ',42) 33.2 '" 2 ' '.e ,Ill "'j 'I' N12-' 43.' 9I)88A 43.' 45.7 gtlub B.l BI2-1 L1Z-!.., chi SIUI "" 52.' loil-2 ".IU, II. J,OlU..'",, Sl.O 0'9) clbo... lui "" '...,.,,, d ;0'99 d "' " I' ".,.J711A Ii! u. sec 6'.1 e211 C,..... ".W 6B.1.,.,. e , \ 7' ' 11-\ , ,JliB 78.1 " 7& ", ' S _ g3d 18.2, lis.' " 78.' ",,421C m 11.2 ",n <:41\ BJ.t 9412 ".0 "" " , g"s IOU <00, e211a " j'! 18N (18-' ". 107.'.",,,,., " 10\1.2 L I.m, 10.1 FI ' lot1.7 1I c421a,,,.., c 'i 115 :5 11' _ WI1_2 " 1 I II.' 111. colo III. G -: ;428 1 H. '1 lli "7 1; ,,6-' I '.5 ill!, _ IlO.2. 12'.7 " In.7 Ill.' ;262 In I2t. Ill.: 13a.' 13J '.4 Ul.: ells 1l8.1 6-, IJI '.0 IU.7 csis H5.7,lin 1& l1f 00'" ,Igl!I.I, 1\3. I:l.' Byrne et al., Theor Appl Genet 91: ,

4 What is a genetic map? A sequence-based map measures distance between chromosome locations in basepairs. A genetic map measures distance between chromosome locations via the recombination rate at meiosis. Two markers are d centimorgans (cm) apart if there is an average of d crossovers in the intervening interval in every 100 meiotic products. 4

5 Meiosis 5

6 Genetic distance Genetic distance between two markers (in cm) = Average number of crossovers in the interval in 100 meiotic products. Intensity of the crossover point process Recombination rate varies by Organism Sex Chromosome Position on chromosome 6

7 Recombination fraction We generally do not observe the locations of crossovers; rather, we observe the grandparental origin of DNA at a set of genetic markers. Recombination across an interval indicates an odd number of crossovers. Recombination fraction = Pr(recombination in interval) = Pr(odd no. XOs in interval) 7

8 Map functions A map function relates the genetic length of an interval and the recombination fraction. r = M(d) Map functions are related to crossover interference, but a map function is not sufficient to define the crossover process. Haldane map function: no crossover interference Kosambi: similar to the level of interference in humans Carter-Falconer: similar to the level of interference in mice 8

9 Ordering markers A B C a b c ABC ABc Abc AbC abc abc abc abc Marker orders: A B C A C B B A C With M markers, there are M!/2 possible orderings. For M = 100, M!/

10 Ordering markers A B C a b c ABC ABc Abc AbC abc abc abc abc Marker orders: A B C A C B B A C With M markers, there are M!/2 possible orderings. For M = 100, M!/

11 Ordering markers A B C a b c ABC ABc Abc AbC abc abc abc abc Marker orders: A B C A C B B A C With M markers, there are M!/2 possible orderings. For M = 100, M!/

12 Ordering markers A B C a b c ABC ABc Abc AbC abc abc abc abc Marker orders: A B C A C B B A C With M markers, there are M!/2 possible orderings. For M = 100, M!/

13 Eucalypt pedigree A B C D 118 progeny 13

14 Inferring phase ( A1 A 2 B 1 or A 1 B 2 B 2 B 1 A 2 ) ( C1 C 2 D 1 or C 1 D 2 D 2 D 1 C 2 ) locus 1 locus 2 AC AD BC BC AC AD BC BD 14

15 cono Eucalypt genetic map m 0.0 ;I se 0.0 '" '" 1 1l '",IU,.."", ' " g427 " ",,.,.0'" ;na., '.0,0029 C422 "1 "...., ". C184 '" '" \2.3 92) ,-, " '" 1 '.2 gogba, 474 1!I.I a.,!isb 17,!I ".' 1'.1,, 19.3 " c187 2 l. i 9' 2 I F,,06;8 20. \, 'H U14..J ", c16!ic,, 2B. ' "' goj2.\, eu'... " H 21.1., HII-l "" 32.8 J' H ',42) 33.2 '" 2 ' '.e ,Ill "'j 'I' N12-' 43.' 9I)88A 43.' 45.7 gtlub B.l BI2-1 L1Z-!.., chi SIUI "" 52.' loil-2 ".IU, II. J,OlU..'",, Sl.O 0'9) clbo... lui "" '...,.,,, d ;0'99 d "' " I' ".,.J711A Ii! u. sec 6'.1 e211 C,..... ".W 6B.1.,.,. e , \ 7' ' 11-\ , ,JliB 78.1 " 7& ", ' S _ g3d 18.2, lis.' " 78.' ",,421C m 11.2 ",n <:41\ BJ.t 9412 ".0 "" " , g"s IOU <00, e211a " j'! 18N (18-' ". 107.'.",,,,., " 10\1.2 L I.m, 10.1 FI ' lot1.7 1I c421a,,,.., c 'i 115 :5 11' _ WI1_2 " 1 I II.' 111. colo III. G -: ;428 1 H. '1 lli "7 1; ,,6-' I '.5 ill!, _ IlO.2. 12'.7 " In.7 Ill.' ;262 In I2t. Ill.: 13a.' 13J '.4 Ul.: ells 1l8.1 6-, IJI '.0 IU.7 csis H5.7,lin 1& l1f 00'" ,Igl!I.I, 1\3. I:l.' Byrne et al., Theor Appl Genet 91: ,

16 CEPH pedigrees

17 Marshfield maps: Tasks Assemble data Understand marker names AFM, UT, CHLC (GATA etc.), Mfd, D*S* Identify cryptic duplicates Order markers and identify genotyping errors Removed 764 / 969,425 genotypes 17

18 CRIMAP chrompic ma -11-i i i--1111i-1111-i i pa o00o o o ma -11-i i111i--i i i pa i11-1i1-111-i111i i ma -11-i i o--0000o-0000-o o pa o00o i i ma -00-o o o--0000o-0000-o i pa i11i i i ma -00-o o o--0000o-0000-o o pa i11i i i ma -10-o o--o o i pa o o000o o ma -11-i i i--1111i-1111-i i pa o00o o o ma -11-o o o--0000o-0000-o o pa i11i i i ma -00-i i i--1111i-1111-i i pa o00o o o ma -11-i i i--1111i-1111-i o pa o00o o o

19 CRIMAP chrompic ma -11-i i i--1111i-1111-i i pa o00o o o ma -11-i i111i--i i i pa i11-1i1-111-i111i i ma -11-i i o--0000o-0000-o o pa o00o i i ma -00-o o o--0000o-0000-o i pa i11i i i ma -00-o o o--0000o-0000-o o pa i11i i i ma -10-o o--o o i pa o o000o o ma -11-i i i--1111i-1111-i i pa o00o o o ma -11-o o o--0000o-0000-o o pa i11i i i ma -00-i i i--1111i-1111-i i pa o00o o o ma -11-i i i--1111i-1111-i o pa o00o o o

20 Top of chr 22 Marker Dnumber sex-ave(cm) female(cm) male(cm) 1 ATA2G02 Unknown GATA198B05 Unknown AFM217xf4 D22S AFM288we5 D22S yf5 D22S GGAA10F06 D22S AFMa037zd1 D22S AFM292va9 D22S Mfd51 D22S

21 Marker search 21

22 10th worst graph Broman et al., Am J Hum Genet 63: ,

23 Total no. crossovers Broman et al., Am J Hum Genet 63: ,

24 Crossover locations Broman and Weber, Am J Hum Genet 66: ,

25 Family 884, chr 6 Maternal chromosomes Paternal chromosomes

26 Autozygosity Broman and Weber, Am J Hum Genet 65: ,

27 Crossover interference Broman and Weber, Am J Hum Genet 66: ,

28 Maternal chr 8 28

29 Apparent triple XOs Broman et al., In: Science and Statistics: A Festschrift for Terry Speed,

30 Chr 8p inversion Broman et al., In: Science and Statistics: A Festschrift for Terry Speed,

31 Comparison to sequence Matise et al., Am J Hum Genet 70: ,

32 Comparison to sequence (revisited) Genetic map position (cm) D22S531 (UT5900) D22S529 (UT1963) D22S1175 (AFM331wc9) D22S534 (UT7213) Sequence Position (Mbp) Thanks to UCSC and Ensembl! 32

33 Dog genetic map Neff et al., Genetics 151: ,

34 Dog genetic map Neff et al., Genetics 151: ,

35 The first dog map Mellersh et al., Genomics 46: ,

36 The dog X Mellersh et al., Genomics 46: ,

37 The dog X Mellersh et al., Genomics 46: , 1997 Neff et al., Genetics 151: ,

38 C. savignyi (sea squirt) Aim: Build linkage map to improve long-range ordering of draft genome sequence (which is currently represented as a few hundred reftigs ) 38

39 C. savignyi pedigree 48 progeny Markers: PCR amplicon (primers in exons, spanning an intron), digested with a restriction enzyme 2, 3 or 4 banding patterns Tricky bits: Which marker phenotype corresponds to which genotype? Using information on locations of markers within reftigs 39

40 Example of two markers HaeIII MboI AC AD BC BC AC AD BC BD

41 Example of two markers HaeIII MboI AC AD BC BC AC AD BC BD

42 Example of two markers HaeIII MboI AC AD BC BC AC AD BC BD

43 Example of two markers HaeIII MboI AC AD BC BC AC AD BC BD

44 C. savignyi map Downloaded from genome.cshlp.org on August 8, Published by Cold Spring Harbor Laboratory Press C. savignyi genetic map Figure 1. Integrated genetic map and extended sequence assembly. Fourteen linkage groups in C. savignyi were identified by genetically mapping 182 markers, corresponding to 110 distinct genetic loci. Vertical black bars represent the genetic linkage map. All bars are drawn to scale; the length of the bars is proportional to the number of centimorgans (cm). Map distances at each mapped genetic locus are indicated to the left of each bar. Blue bars on the right of each pair represent reftigs. Tick marks indicate the physical location of the markers among the 104 mapped reftigs. Transverse lines link the location of each marker on the genetic and physical maps. Numbers to the right of each assembly fragment correspond to reftig identifiers in Version 2.1 of the assembly (Small et al. 2007a). When two or more markers on the same reftig allowed the orientation of the fragment to be resolved, Hill et al., Genome Res 18: ,

45 High-resolution mouse map Shifman et al. (PLoS Biology, 4:e395, 2006) constructed a high-resolution genetic map of the mouse genome. 10,202 SNPs 80 families from the latest generations in a heterogeneous stock (HS) of outbred mice 4,048 meioses Valuable resource for mouse geneticists Characterization of recombination rate variation Particularly regarding the sex difference in recombination. 45

46 Concerns In order to accommodate the analysis of 8 complex pedigrees, Shifman et al. used a sliding window of 5 15 SNPs. The remaining 72 families were all nuclear, and many lacked parental genotype data or had genotype data on just one parent, and many were small (as few as 2 siblings). The software used (CRIMAP, last revised in 1990) makes some approximations that result in biased estimates of genetic distances (even the sex-averaged ones) in the case of small sibships with incomplete parental genotype data. The length of the genetic map (and that of particular chromosomes) was quite different from previous mouse maps. 46

47 A complex pedigree The diamonds are sibships. 47

48 What we did Obtained the raw data. Omitted 13 individuals with clear pedigree errors. Switched the sex of 26 individuals from female to male. Omitted 176 genotypes due to Mendelian inconsistencies. Split the large pedigrees into sibships (plus parents and grandparents). Split the larger sibships. Omitted sibships with no parental genotypes. Omitted small sibships ( 8 sibs) with genotype data on just one parent. Omitted 538 genotypes leading to apparent tight double crossovers. 48

49 Substantial differences The revised genetic maps... Are much smaller. The autosomal genome is 11% smaller in the revised maps Show a much smaller sex difference. Shifman et al.: female autosomal genome is 26% longer than the male. Revised maps: female autosomal genome is 9% longer than the male. Show fewer regions of unusually high recombination rate. Torrid regions disappear or have markedly attenuated rec n rates. 49

50 Recombination rates (chr 1) Sex averaged (cm/mb) Shifman et al. Revised Sex specific (cm/mb) Male Female Position (Mb) Cox et al., Genetics 182: ,

51 Recombination rates (chr 5) Sex averaged (cm/mb) Shifman et al. Revised Sex specific (cm/mb) Male Female Position (Mb) Cox et al., Genetics 182: ,

52 Recombination rates (chr 18) Sex averaged (cm/mb) Shifman et al. Revised Sex specific (cm/mb) Male Female Position (Mb) Cox et al., Genetics 182: ,

53 Morals Genetic maps continue to be useful Be careful about automated map construction Careful, tedious work is often necessary The simplest things can have the greatest impact Artifacts can be more interesting than anything else Don t give a sex-averaged map of the X chromosome Use care in data cleaning Split large pedigrees into non-overlapping sibships rather than resort to the use of a sliding window of markers Use computer simulations to verify the appropriateness of the choices you make in a complex analysis 53

54 Acknowledgments Terry Speed, University of California, Berkeley Jim Weber, PreventionGenetics (formerly Marshfield Medical Research Foundation) Mark Neff, University of California, Davis Matt Hill and Arend Sidow, Stanford Beth Dumont, University of Wisconsin Madison Gary Churchill, Bev Paigen, Ken Paigen, et al., The Jackson Lab Jonathan Flint, The Wellcome Trust 54

55 Acknowledgments Terry Speed, University of California, Berkeley Jim Weber, PreventionGenetics (formerly Marshfield Medical Research Foundation) Mark Neff, University of California, Davis Matt Hill and Arend Sidow, Stanford Beth Dumont, University of Wisconsin Madison Gary Churchill, Bev Paigen, Ken Paigen, et al., The Jackson Lab Jonathan Flint, The Wellcome Trust 55