HCS806 Summer 2010 Methods in Plant Biology: Breeding with Molecular Markers

Size: px
Start display at page:

Download "HCS806 Summer 2010 Methods in Plant Biology: Breeding with Molecular Markers"

Transcription

1 HCS806 Summer 2010 Methods in Plant Biology: Breeding with Molecular Markers Lecture 7. Populations The foundation of any crop improvement program is built on populations. This session will explore population development. We will review resources available to access new genetic variation and common population structures used in genetic analysis. We will further compare/contrast populations used for genetic analysis with populations used by plant breeders. The objectives of this module are to help students understand the diversity of populations that are being used to test marker-trait associations (linkage); understand the difference between the discovery of linkage and use of markers for selection; use this understanding of populations to facilitate interaction with colleagues from other disciplines (field, marker support, analysis, etc ); and use this understanding to design and implement discovery and selection projects. At the end of this module you should: Be able to describe the genetic behavior of F 2, backcross (BC), Recombinant inbred line (RIL) and inbred backcross (IBC) populations. Be comfortable using segregation, independent assortment, and Hardy-Weinberg equilibrium to describe the expected frequency of alleles and heterozygosity in standard populations. Be familiar with software used to visualize graphical genotypes and comfortable with the use of graphical genotypes to describe populations. Be familiar with association mapping models Where to go to access new genetic variation Only a few hundred years ago, exploration was driven by a quest for gold and plants (in the form of spice, seed, bulbs, and other plant parts). The importance plants in the economies of European countries is illustrated by the economic bubble in the Dutch tulip market which peaked in ~1637 ( Prized tulip bulbs sold for prices that were in the range of 10 fold the yearly income of a craftsman. Genetic resources, germplasm, remain a treasure. The largest public collection of germplasm in the world is maintained by the United States Department of Agriculture (USDA), Agricultural Research Service (ARS), National Plant Germplasm System (NPGS). The NPGS houses collections in ~20 locations. Each location is responsible for curating and regenerating multiple species. The germplasm housed in the NPGS is freely available in small quantities for research purposes. Germplasm accessions are assigned a Plant introduction (PI) number, and Information on each accession is stored in the Germplasm Resources Information Network (GRIN) database ( Searches in GRIN can be accomplished based on the PI number, on taxonomy, and text queries (using standard operators such as AND to link words).

2 Figure 1. Homepage for the National Plant Germplasm System Germplasm Resources Information Network (GRIN) database ( Populations. Population development is often taken to refer to so called line crosses which are developed by cross pollination of inbred lines. Such a cross takes the form of Parent 1 x Parent 2 > F 1. When describing such crosses, by convention the female parent is listed first. Self pollination (selfing) of an F 1 leads to an F 2 population in which two alleles (p and q) with initial frequencies of 0.5 are in Hardy-Weinberg equilibrium (p. Parent 1 Parent 2 F1 F2 pp (100%) qq (100%) pq (100%) pp (25%) + pq (50%) + qq (25%) The Punnett Square for self-pollination of an F 1 looks like: Pollen Ovule p q p pp pq q pq qq

3 Repeated self-pollination (inbreeding) drives populations toward homozygosity. With each generation of self-pollination, 50% of heterozygosity is lost. Alleles are fixed in the homozygous condition. Frequency F1 F2 F3 F4 F5 F6 F Generation Cc CC+cc Review: inbreeding drives individuals toward homozygosity. Figure 2. The effect of inbreeding on allele frequencies. Figure 3. Graphical Representation of an F 2 population (two chromosomes, no recombination illustrated).

4 We can describe a population derived from line crosses and individuals that make up the population in terms of the relationship of alleles between and within individuals: F 2 Expected segregation of alleles 1:2:1 Number of alleles at each loci At most, 2 Alleles shared between prog. and parents 50% of alleles are shared by descent Alleles shared between F2 progeny 50% of alleles are shared by descent Alleles within individuals 50% homozygous 50% heterozygous If we take this population to the next generation Alleles shared between prog, and parents 50% (of the Parental genotype) Alleles shared within each F3 family 75% Alleles shared between F3 families 50% Alleles within individuals 75% homozygous 25% heterozygous As the population is selfed toward homozygosity (Recombinant inbred lines) Each line will share 50% of alleles with each parent Each line will share 100% alleles w/in individuals w/in the line Each line will share 50% of alleles with other progeny lines Each line will approach 100% homozygous Backcross populations. Backcross populations are derived from crossing the F 1 back to one of the parents (the recurrent parent). Typically BC designs are used to introduce one or a few alleles from the donor parent into the recurrent parent background. Figure 4. Pedigree of a BC2 Advanced Backcross and BC2S5 Inbred Backcross (IBC) Populations Parent 1 x Parent 2 (Donor) F 1 x Parent 1 BC 1 (n lines) BC 1-1 x Parent 1 BC 2-1S 0... BC 2-1S 5 BC 1-2 x Parent 1 BC 2-2S 0... BC 2-2S 5.. BC 1 -n x Parent 1 BC 2 -ns 0... BC 2 -ns 5 AB IBC

5 Wherhahn and Allard (1965, Genetics 51: ) described the inbred backcross (IBC) population as a method of isolating and studying individual quantitative trait locus (QTL). The work described in their 1965 pulbication represents an early attempt to Mendelize QTL. In the late 1990 s Steve Tanksley demonstrated that variations of the IBC could be use for simultaneous introgression and mapping of QTL. He worked with populations that had not yet been selfed to homozygosity, calling these Advanced Backcross (AB) populations. Figure 4 illustrates the relationship between AB and IBC populations. The concept of Mendelizing QTL has culminated in the cloning of genes that affect quantitative traits for susceptibility to disease, milk production, fruit size, fruit shape and more. Properties of AB and IBC populations. Table 1. Expected segregation, single seed descent genotype, and plot genotypes for AB and IBC populations. Single seed Plot Single seed Plot Generation AB Genotype Genotype IBC Genotype Genotype BC1 1:1 (Aa:AA) (Aa : AA) 3:1 (AA:aa) (AA:aa) BC2 3:1 (AA:Aa) (AA:(Aa,aa)) 7:1 (AA:aa) (AA:aa) BC3 7:1 (AA:Aa) (AA : (Aa,aa)) 15:1 (AA:aa) (AA:aa) Where AA is the recurrent parent genotype and aa is the donor parent genotype Table 2. IBC Population size needed to recover a specific locus in BC 3 S 5 lines at different probabilities. Number of lines Number of lines in population for a given recovered with probability of recovery 1 a specific locus 95% 90% 80% 75% 66% Calculated from N = ln(1-propability)/ ln(1-frequency) with probabilities of 95%, 90%, 80%, 75%, and 66% and a frequency of (0.5) k+1 where k = 3 for BC 3 S 5.

6 Table 3. IBC Population size needed to recover a locus in BC 2 S 5 lines at specified probabilities. Number of lines Number of lines in population for a given recovered with probability of recovery 1 a specific locus 95% 90% 80% 75% 65% Calculated from N = ln(1-propability)/ ln(1-frequency) with probabilities of 95%, 90%, 80%, 75%, and 65% and a frequency of (0.5) k+1 where k = 2 for BC 2 S 5. What do these tables mean? For a BC 2 S 5 population of 113, 95% of the markers would detect at least 5 individuals with the donor parent locus. Progeny segregation for a single locus would therefore range from 5:108 to 23:80 with an expected average of 14:99 (1:7). The concept of a graphical genotype. The concept of a graphical genotype as a way to visualize the genotype of individuals and population using a graphical format (Young, N. D. and S. D. Tanksley Restriction fragment length polymorphisms maps and the concept of graphical genotypes. Theoretical and Applied Genetics. 77(1): ). When allele states are coded and arranged in a map order (either genetic or physical order), the graphical genoytype displays the parental origin and allelic composition across the entire genome. Figure 5. The graphical genotype for a single individual. Not only does the graphical genotype provide an overview of how variation is distributed across genomes, but examining the graphical genotype of informative recombinants provides an intuitive approach to ordering genes. Figure 5 illustrates a case for ordering a disease resistance phenotype (vsc) relative to markers. In this example genotypes are scored 0 and 1, where 0 and 1 represent parental genotypes 00 and 01 from a backcross data set lnkspt and CT202 is a marker that has been added to the population).

7 gen vsc TG23 PTO CT202 gen PTO vsc CT202 TG23 H H Figure 6. Graphical genotypes of recombinant individuals. In practice, establishing gene order is done by various approaches: Minimizing the double recombinants necessary to explain an order. Minimizing the magnitude of the two-locus recombination fractions. FM Heirloom Processing Wild Chromosome 1 Chromosome 2 Figure 7. Graphical genotypes for Chromosome 1 and Chromosome 2 of tomato. The linkage map is illustrated to the left of each graph. Markers are color coded based on allele number and state. Data are illustrated for a germplasm collection consisting of fresh-market (FM) and Processing tomato breeding material, older (Heirloom) varieties, and wild species used as donor parents.

8 A tool to create graphical genotypes was described by Van Berloo (2008). GGT (Graphical GenoTypes) is a software package that assists in graphical representation of molecular marker data which can assist in the process of selection and evaluation of plant material. The software and manual are available at What about populations used by plant breeding programs? Plant breeding programs usually consist of progeny derived from many populations that are evaluated simultaneously. These populations therefore have a complex structure, that affects how traits are mapped. Human geneticists have long worked with large unstructured populations to identify associations between polymorphisms in candidate genes and susceptibility to disease (cancer and diabetes), mental health problems, and physiological condition (obesity). From these large studies we have learned that the ability to detect associations in complex populations can be affected by underlying structure in the sample. We will explore issues of complex populations using a case study based on bacterial spot resistance. Before exploring more complex examples of factors that influence our ability to detect marker-trait linkage and select in complex populations, it will help to first review a simple example of why linkage between markers and traits does not always transfer from one population to another. M1 M2 OH75: 1, R, 1 M1 Rx-3 M2 M1 rx-3 M2 OH86: 0, S, 1 FL82 1, S, 0 M1 rx-3 M2 Figure 8. Illustration of linkage between molecular markers (M1 and M2) and a disease resistance locus Rx-3. In Figure 8, linkage of resistance conferred by Rx-3 between M1 allele 1 and M2 allele 1 in line OH75 is broken in line OH86 (M2) and FL82 (M1). Thus in a cross between OH75 and OH86, linkage between M1 and Rx-3 can be detected, but not between M2 and Rx-3. The converse is true for crosses between FL82, where only marker M2 will be suitable to detect linkage.

9 Data sets for case study References: Frary, A., T.M. Fulton, D. Zamir, and S. D. Tanksley Advanced backcross QTL analysis of Lycopersicon esculentum x L. pennellii cross and identification of possible orthologs in the Solanaceae. Theor Appl Genet. 108: Van Berloo, Ralph GGT 2.0: Versatile software for visualization and analysis of genetic data. Journal of Heredity. 99(2): Wehrhahn, C. and R. W. Allard The detection and measurment of the effect of individual genes involved in the inheritance of a quantitative character in wheat. Genetics 81: Young, N. D. and S. D. Tanksley Restriction fragment length polymorphisms maps and the concept of graphical genotypes. Theoretical and Applied Genetics. 77(1): See also the Generation Challenge Programme Crop Bioinformatics Course