How to Map a Marker Associated with a Major Gene Bob Fjellstrom USDA-ARS Beaumont, TX About 430 million bp of DNA in genome 12 chromosomes Provides the code for the 35,000+ genes of rice. The Rice Genome How do I start finding a marker in all this? 1
Previous Knowledge about Trait Is inheritance known? trait simply inherited? Has it been mapped? consult Gramene (www.gramene.org) what markers are available? Trait location not known If simply inherited, have two main ways of finding linked markers: 1) Whole genome mapping assay 120 (+/-) markers (polymorphic, well-distributed) and look for linkage association. 2) Bulk segregant analysis. If complex inheritance, look at whole genome mapping of quantitative trait loci (QTL) 2
Whole Genome Mapping Utilize 120 (+/-) polymorphic markers Microsatellites (SSRs) Codominant markers, technologically easy to use Work well for wide (indica/japonica) crosses Medium grain/long grain crosses useful too AFLP markers can be more useful for narrower crosses (e.g., within long grain japonicas) More polymorphisms found (because of large number of markers scored per run) Predominantly dominant markers Technologically more demanding SSR and AFLP marker gels SSR markers Can run 2 to 8 SSRs in each lane scored. AFLP markers Several bands in each lane, possible to score many polymorphisms per lane. 3
Whole Genome Mapping SNPs another alternative Single nucleotide polymorphisms. Becoming the new marker of choice. Will cover later. RFLP markers Still used, but labor intensive. SSRs often used instead. 1 2 3 4 5 6 7 8 9 10 11 12 RG447 RG472 C131 RZ288 RG140 RG532x RZ382 R210 RG811x CDO226a CDO348 RZ776x CDO455 CDO118 RG462 RG957 RZ14 RZ801 RG236 C112x RG555 7.1 16.5 5.5 RG634 11.1 6.5 25.2 G1327 RG83 4.2 7.0 RZ599 12.8 RZ476a 3.1 RG437 RZ476b 14.5 5.2 G294c 20.7 6.9 2.9 CDO718 RZ386x 37.4 G45 8.5 C624x 14.9 RG139 RZ273 13.2 RZ260x RG654, RG654 RZ446a RZ446b RG520 29.3 27.3 9.6 18.5 9.0 6.3 7.4 16.8 35.2 10.7 17.3 18.6 6.2 10.9 7.1 17.8 6.3 6.5 18.8 RG418 31.5 G249 12.5 RZ761 18.2 C944x 4.8 CDO337 3.4 C746 20.8 RZ474 15.8 CDO795x 7.5 RG482 4.6 RZ403x RG100 2.1 RG450 9.3 C74x 8.8 RG341b 6.2 RG944 4.2 C636x 7.5 RG348x 5.0 C515 2.8 RG104 Y1065Lc RG1094e RG190 4.9 RZ69x 12.1 22.0 C949 G200b, G271 RZ740x G379 RG214 RZ590x, G177 RG143x HHU39x 16.9 7.2 35.2 25.5 11.9 3.5 8.1 4.2 gl-1 Y1049 R569x RG13 CDSR49 RG346 37.6 14.7 26.7 22.0 35.7 wx RZ762 C76 RZ516 RZ2 C G200a RZ667 C235x G294x G1468b RG424 G1314b HHU37 RZ682 C236 RG653 RZ508 38.7 12.1 5.4 9.5 17.1 11.7 10.4 9.5 12.1 3.7 8.5 22.4 16.2 5.8 15.2 3.9 8.0 RG29 RG30 G20 C285, RG678 CDO385 BCD855 CDO497 CDO405 C586 9.0 16.4 2.4 7.6 30.3 7.0 13.1 27.8 C424x RZ143 C825x G104 G1314a G2140, RZ323x C225x G2132a, L457a C1073x G187 G56x R662 17.9 9.2 33.5 14.0 6.4 2.8 12.8 9.4 15.2 26.9 16.7 C397 G103b RZ698 G95 C147, G103a CDO395 CDO1081 RZ777 CDO226b RG570 RZ404 RG451 11.2 16.1 12.6 7.2 3.3 5.7 9.9 7.5 19.1 4.3 G1084 RZ400 RG241x CDO98 Y1065La, RG752 RG1094f C16 RG561 C223 Rice Genetic Marker Map Lemont/TeQing - Pinson et al. 2000 36.6 5.3 8.5 29.6 6 8.9 9.6 8.5 RZ525x RG1022 C975 RZ781 RZ53 RG1094b RG1094d, RZ900 G44 RZ797b RG16 RZ537x RG1109 RZ536x G2132b L457b 14.8 4.7 11.5 4.1 15.6 11.8 10.0 33.5 20.0 21.9 20.8 16.8 9.2 5.0 G193 8.4 RG574 9.2 RZ797a 20.3 RZ257 8.1 RZ397 3.1 L102, G1468a 7.2 RG341a, RG869 17.1 RG91q 6.0 RG20q 5.4 G402 30.0 RG901x 22.7 G1106 4
DNA Marker Example DNA from rice plants analyzed for disease resistance marker 1 5 10 15 20 25 30 S S S S S R R R S R S S S S S S S S S S S S S S RR S RRR DNA marker associated with disease resistance gene Whole Genome Mapping Looking for the reproducible presence of marker in progeny that display the trait. Markers closer to trait on chromosome will show tighter linkage to trait. Measured in centimorgans (recombination distance) 95% correct, 5% wrong marker is 5 centimorgans from gene controlling trait 5
Whole Genome Mapping Whole genome mapping can allow analysis of multiple genomic regions if simple inheritance is not clear. Most markers will be unlinked and have little association with trait expression. Good to reduce the number of markers analyzed. Candidate genes: Knowledge of metabolic pathways and genes involved in metabolism Bulk segregant analysis (BSA) Bulk segregant analysis Focus only on region of interest if trait is simply inherited. One gene or two genes able to be analyzed, but advanced generations typically needed to identify single gene locations. Rapidly eliminates uninformative markers 6
Bulk segregant analysis Progeny are bulked (combined) into groups of 8 (or so) individuals that have or do not have the trait. e.g., bulk of resistant vs. bulk of susceptible plants after a disease resistance screen can simultaneously look at several bulks Markers are run on the bulks and parents to look for the change in presence of a DNA marker band when comparing the bulked progeny. e.g., look for a band that is only in the resistant bulk Bulk Segregant Analysis AFLP marker screen R S R S R S R S R S R S R S RS RS R S R S 7
Bulk segregant analysis Commonly use RAPD or AFLP markers Genetic markers showing numerous polymorphisms on single gel RAPD markers cheap and easy, but frequently hard to reproduce results AFLP markers more technically demanding, but highly reproducible Resistant gene analogs (RGAs) used for finding candidate disease resistance markers Bulk segregant analysis After finding band of interest when comparing bulks, look at separate individuals that make up bulk. Confirm linkage of candidate marker bands with trait of interest in larger mapping population (100 + progeny). Find genomic location of candidate marker. 8
Bulks R S R Individual Plants S Bulk segregant analysis Finding location of BSA candidate marker Isolate marker band and confirm linkage with trait (easier said than done) Sequence candidate marker and perform sequence search to find physical location in genome. BLAST searches using web-based software protocols Gramene, NCBI, TIGR, others have BLAST tools 9
Gene location also identified by... Prior gene mapping efforts Gramene Cornucopia of information on trait inheritance, trait mapping, and detailed cross-referenced genome maps. Gene has been cloned identified and sequenced in rice or other species cloned doesn t mean there is a marker Prior gene mapping efforts Very broad background of genetic information available Most work done in exotic or foreign rice Wide crosses between tropical (indica) and temperate (japonica) rice. Limited amount of work done in USA rice Determine if trait or inheritance is relevant to USA germplasm. 10
Now that location is known Inheritance studies indicate chromosomal location of gene (or genes) controlling trait. Location may encompass 5-10 (+) cm region. Previously identified markers in region may have limitations. Reinvestigate location to identify markers useful for marker assisted breeding. Preferably less than 2 cm from trait. Useful in USA germplasm. Suitable for high throughput analysis. gene location Markers already in region of interest Types of markers that should be replaced Low throughput markers: RFLP, AFLP Poorly reproducible markers: RAPD Markers that are not polymorphic for your germplasm Distant markers depends on level of need, value of trait, cost of phenotyping try replacing if more than 2 cm away from gene Lab limitations: isozymes (plant proteins) 11
Markers already in region of interest Types of markers to keep: SSR, STS, SNP markers Must consider lab capabilities, what kind of markers you are able to score. Still need to test marker polymorphism for germplasm in use. Finding new markers in gene region Good to test mapped SSR markers in region Gramene database of SSR markers Over 2,500 SSRs available (now 18,000 +) Test for polymorphism in parents/germplasm of interest May have to test linkage with trait if genetic distance not known 12
Candidate SSR marker identification Entire Chromosome (180 cm) Focus on specific region (15 cm) Usually many SSRs available to test in region of interest Finding new markers in gene region Sometimes no polymorphic SSRs mapped in region of interest. Identify new SSR markers by searching within Nipponbare sequenced genome. Gramene, TIGR, etc database access points Download BAC sequences to find candidate SSR markers 13
SSR markers for Pi-z gene Gene mapped to center of chromosome 6. SSR markers available, but limitations on polymorphism and linkage distance. Wanted to find SSR markers more closely linked with gene than those already published. SSR markers for Pi-z gene Downloaded DNA sequence information from 20 BAC clones (100-197 kbp in length) found in 8 cm region surrounding the Pi-z gene. 14
SSR markers for Pi-z gene Looked for SSR regions (repeats of CT, AG, AAT, ATT, etc.) having 10 or more repeats. Tested 72 candidate SSR markers (2 from Gramene) for polymorphisms and quality of amplification. Tested on parents from three crosses 19 did not work 18 were monomorphic 34 had limited polymorphism 14 polymorphic for two or three crosses SSR markers for Pi-z gene Note that 1 cm 593,000 bp of DNA in Pi-z region 15
STS markers Sequence Tagged Site markers Developed out of sequence info from: RFLP markers AFLP markers Cloned genes DNA sequence in region of interest PCR primers developed from sequence info for subsequent testing for marker polymorphism. STS markers PCR products analyzed for polymorphism Dominant marker: presence/absence of marker Codominant marker Length difference: directly scored Sequence difference CAPS marker: Test for differential digestion with restriction endonucleases SSCP, Tilling, etc. Leads to SNP-type marker 16
STS dominant marker for Pi-b gene Marker derived from cloned Pi-b gene sequence. Cultivars with Pi-b gene display amplification product, those without Pi-b gene show no amplification products. STS marker for aroma derived from RFLP marker RG28 Single nucleotide length difference in PCR amplification products. Marker is co-dominant. 17
SNP markers Single Nucleotide Polymorphism Can be in genes or intergenic regions. Can be in coding (exons) or non-coding regions of genes. Can be linked to gene/trait or Can be THE functional single base change that effects the trait you are looking at. SNP markers The vast majority of SNPs in your region of interest will not be in the gene controlling the trait. Identified by comparative sequencing. Limited database of SNP markers presently available. Won t (yet) find a set of SNP markers waiting to be used like the SSR or other markers available in the Gramene database. 18
SNP markers Markers can be found by independent sequencing efforts (in your own lab). Can also be found by in silico sequence comparisons. Download sequences and look for polymorphisms. Nipponbare and 93-11 comparisons. cdna comparisons between various randomly sequenced accessions. SNP markers for the Pi-z locus Example of SNP marker linked to gene, not in gene itself. Hayashi et al. (2004) performed exhaustive DNA sequencing efforts (over 79,000 bp sequenced) in Pi-z gene region to find SNP markers linked with Pi-z and Pi-zt genes. Presented 19 SNP markers. 19
SNP markers for the Pi-z gene Hayashi et al. (2004) SNP markers for Pi-z shown on bottom. Waxy gene SNP markers TheWaxy gene encodes the enzyme called granule-bound starch synthase (GBSS), and is the major gene controlling amylose content in rice. Two mutations in the Waxy gene have been identified that are associate with either low or intermediate amylose content (Larkin and Park 2004; Chen et al. 2004). 20
Waxy gene SNP markers G/Ttatac----- Exon 1 SNP in the leader intron 5 splice site. G present in intermediate and high amylose content rice. T present in low amylose content rice. Waxy gene SNP markers Exon 6 SNP of non-glutinous accessions: A associated with highamylose and low-amylose rice accessions C associated with intermediate-amylose rice accessions 21
Waxy gene SNP markers Combine Exon 1 and Exon 6 SNPs: Four Haplotypes G-A G-C T-A T-C Associated with highamylose rice accessions Dixiebelle Jodon A201 Associated with intermediateamylose rice accessions Lemont Associated with lowamylose rice accessions Rico 1 Bengal Associated with glutinous rice accessions* Development of SNP markers Potentially many more SNPs than SSRs. Technology developing for very high throughput, many instruments and platforms being devised to score SNPs. Expensive to develop markers/find polymorphisms. Many SNPs found in indica/japonica comparisons. Only subset of above polymorphisms (10% or less) found in any single japonica/japonica cross, different polymorphisms found in different crosses. 22
Thanking the combined efforts of: Molecular Genetics Joe Kepiro Eric Christensen Fran Pontasch Mickey Frank Genetics Shannon Pinson Faye Seaberg Pathology Toni Marchetti Robert Shank Variety Improvement Anna McClung Jodie Cammack Rick Boyd Pat Carre Cereal Chemistry Ming-Hsuan Chen Christine Bergman Janis Delgado Naomi Gipson 23