Identifying Genes Underlying QTLs

Similar documents
Mapping and Mapping Populations

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human?

Genomic resources and gene/qtl discovery in cereals

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Experimental Design and Sample Size Requirement for QTL Mapping

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Answers to additional linkage problems.

Molecular markers in plant breeding

Human linkage analysis. fundamental concepts

Genetics of dairy production

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Speeding up discovery in plant genetics and breeding

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

Design and Construction of Recombinant Inbred Lines

Gene Discovery of Nitrogen Utilization in Maize Yuhe Liu, Devin Nichols, Stephen Moose Department of Crop Sciences, University of Illinois

MAPPING OF QUANTITATIVE TRAIT LOCI (QTL)

Computational Workflows for Genome-Wide Association Study: I

Linkage Disequilibrium. Adele Crane & Angela Taravella

Video Tutorial 9.1: Determining the map distance between genes

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Genetics II: Linkage and the Chromosomal Theory

Genome-Wide Association Studies (GWAS): Computational Them

7 Gene Isolation and Analysis of Multiple

Conifer Translational Genomics Network Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural Project

Quantitative Genetics

Strategy for applying genome-wide selection in dairy cattle

Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

Conifer Translational Genomics Network Coordinated Agricultural Project

Concepts: What are RFLPs and how do they act like genetic marker loci?

MICROSATELLITE MARKER AND ITS UTILITY

Park /12. Yudin /19. Li /26. Song /9

Population and Community Dynamics. The Hardy-Weinberg Principle

Tile Theory of Pre-Breeding

Targeted Recombinant Progeny: a design for ultra-high resolution mapping of Quantitative Trait Loci in crosses between inbred or pure lines

LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES

Mapping and selection of bacterial spot resistance in complex populations. David Francis, Sung-Chur Sim, Hui Wang, Matt Robbins, Wencai Yang.

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

The Theory of Evolution

Agricultural Outlook Forum Presented: February 17, 2006 STRATEGIES IN THE APPLICATION OF BIOTECH TO DROUGHT TOLERANCE

AP BIOLOGY Population Genetics and Evolution Lab

Sequencing Millions of Animals for Genomic Selection 2.0

Genetics. Genetics- is the study of all manifestation of inheritance from the distributions of traits to the molecules of the gene itself

Genetics & The Work of Mendel

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Random Allelic Variation

TEXAS A&M PLANT BREEDING BULLETIN

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

LS50B Problem Set #7

Chapter 6. Linkage Analysis and Mapping. Three point crosses mapping strategy examples. ! Mapping human genes

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

Recombination. The kinetochore ("spindle attachment ) always separates reductionally at anaphase I and equationally at anaphase II.

3I03 - Eukaryotic Genetics Repetitive DNA

MSc specialization Animal Breeding and Genetics at Wageningen University

The principles of QTL analysis (a minimal mathematics approach)

Chapter 4 Gene Linkage and Genetic Mapping

CHAPTER 14 Genetics and Propagation

Association Mapping in Wheat: Issues and Trends

Would expect variation to disappear Variation in traits persists (Example: freckles show up in unfreckled parents offspring!)

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

Inheritance Biology. Unit Map. Unit

Chapter 20 Biotechnology and Animal Breeding

Fertility Factors Fertility Research: Genetic Factors that Affect Fertility By Heather Smith-Thomas

Genomic selection applies to synthetic breeds

Observing Patterns in Inherited Traits. Chapter 11

LECTURE 5: LINKAGE AND GENETIC MAPPING

Gene Mapping. Biology 20. Principles of Gene Mapping & Practice Problems. See Freeman 2e pp ; or Campbell 7e pp

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

Introduction to quantitative genetics

Genome-wide association studies (GWAS) Part 1

Marker-assisted-selection (MAS): A fast track to increase genetic gain in horticultural crop breeding

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

Whole-Genome Genetic Data Simulation Based on Mutation-Drift Equilibrium Model

DNA Technology. B. Using Bacteria to Clone Genes: Overview:

BTRY 7210: Topics in Quantitative Genomics and Genetics

Methods for linkage disequilibrium mapping in crops

SNPs - GWAS - eqtls. Sebastian Schmeier

Trasposable elements: Uses of P elements Problem set B at the end

OBJECTIVES-ACTIVITIES 2-4

Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino

Review Article Application of Association Mapping to Understanding the Genetic Diversity of Plant Germplasm Resources

Power and false-positive rate in QTL detection with near-isogenic line libraries

The Evolution of Populations

Population Genetics (Learning Objectives)

GENOME MAPPING IN PLANT POPULATIONS

LINKAGE DISEQUILIBRIUM MAPPING USING SINGLE NUCLEOTIDE POLYMORPHISMS -WHICH POPULATION?

MAPPING GENES TO TRAITS IN DOGS USING SNPs

An introduction to genetics and molecular biology

CHAPTER 12 MECHANISMS OF EVOLUTION

Introduction to Molecular Biology

COMPUTER SIMULATION TO GUIDE CHOICE OF BREEDING STRATEGIES FOR MARKER-AIDED MULTIPLE TRAIT INTEGRATION IN MAIZE TING PENG

QUANTITATIVE TRAIT LOCI AFFECTING THE AGRONOMIC. PERFORMANCE OF A Sorghum bicolor (L.) Moench RECOMBINANT INBRED RESTORER LINE POPULATION

Transcription:

Identifying Genes Underlying QTLs Reading: Frary, A. et al. 2000. fw2.2: A quantitative trait locus key to the evolution of tomato fruit size. Science 289:85-87. Paran, I. and D. Zamir. 2003. Quantitative traits in plants: Beyond the QTL. Trends in Genet. 19:303-306. Yu, J. et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genet. 38:203 208. In previous lectures on marker-assisted selection, we saw that selection on linked markers can be useful under certain circumstances. Some limitations to marker-assisted selection include loss of a target allele due to recombination between marker and target gene during the selection process, and the sometimes poor predictive ability of markers across populations (or crosses). One way to circumvent these two specific problems is to develop markers that are part of the gene sequence that causes the phenotypic difference underlying the QTL. The most difficult part of this approach is the identification of the specific gene or sequence that causes the genetic difference. Several ways to tackle this problem are outlined below, but note that they are very intensive processes, requiring large investments in time and money. Therefore, these approaches should be targeted at traits that are of extremely high value for selection, but are hard to select for, and have a reasonable chance of success. Thus, at this point, these methods are most suitable to QTLs with relatively large effects of the phenotype. Map-based cloning Map-based cloning is, in a sense, simply very high-resolution QTL mapping. If you can define the interval of a QTL to within a very small physical distance, you can sequence this region, identify the gene(s) in the region, and use various methods to distinguish which genes correspond to the QTL. This is not at all trivial. Recall that typically in QTL mapping we can define an interval containing a QTL to somewhere on the order of 20 cm. How many genes exist in a 20 cm interval? As a rough estimate, maize is predicted to have about 60,000 genes and maize genetic maps tend to be around 2,000 cm in total. Therefore, a 20 cm interval represents about 1% of the genetic map length. If genes are distributed evenly along the genetic map length, then on average, we expect to find 1% of 60,000 genes, or 600 genes, in such an interval. Thus, even if you already have the sequence available for such a region, the problem of distinguishing which gene or genes among these 600 causes the phenotypic effect of the QTL remains. If you do not have sequence for this region already available, then you may need to first sequence the region. Again, using maize as an example, the total nuclear genome size is 2,500 Mbp of DNA. Therefore, if 20 cm represents 1% of the physical distance of the maize genome (which it should on average), then such a segment is expected to contain 25 Mbp, which is a massive region to sequence (and further, you may be unlucky and your

QTL may reside in a recombination cold spot, in which case the region could be much larger). Therefore, to reduce the size of the region that needs to be sequenced and the number of gene candidates that need to be sifted through, one needs to reduce the likely interval containing the QTL to as small a size as possible. As an example of how this can be done, consider the Frary et al. (2000) study where they cloned a QTL with major effect that causes differences in tomato fruit size. This work was based on the fine-mapping studies by Alpert and Tanksley (1996). They began by making introgression lines by backcrossing gene regions from a wild tomato into domesticated tomato. One of the introgression lines developed was nearly-isogenic with the recurrent domesticated parent, containing almost all of the recurrent parent genome except for an introgression block around the previously mapped tomato fruit weight QTL on chromosome 2. They crossed this NIL to the recurrent parent and produced a huge F2 population of 3472 plants, and screened each of these plants with two RFLP markers flanking the QTL interval. They identified 55 F2 plants that had recombinations between these two markers, and grew progeny of those plants in the field for phenotyping and highresolution QTL mapping. Additional markers were mapped within this small region to permit precise localization of the QTL based on phenotypic data of recombinant families to a region of 0.13 cm. At the same time, these markers were used to screen a library of yeast artificial chromosomes (YACs), which can hold up to 1 Mbp of DNA, to identify those YACs that contain the sequences corresponding to the marker. Once such YACs were identified, they were cut with restriction enzymes, and specific pieces of the YACs were mapped in this region. In this way, they determined that the 0.13 cm interval containing the QTL spanned a physical size of about 150 kb. Next, Frary et al. (2000) screened a cdna library with this YAC to identify gene expression sequences derived from the YAC and found four unique gene transcripts. They then directly tested each of these four genes for effects on fruit weight by transforming a large-fruited line with the small fruited allele from each of these four genes separately. In this way, they directly confirmed that the fw2.2 QTL was caused by the ORFX gene for which they had cdna clones. Association Mapping All of the QTL mapping methods described in previous lectures have relied on a population developed from a cross of two parental lines. This requires the development of mapping populations, which, as in the case of recombinant inbred lines, may require multiple generations of population development before the study can even be initiated. An alternative approach, termed association analysis, exploits the genetic variation already present in breeding populations or germplasm collections. As in typical QTL mapping, one tests for an association between variation at a known gene or marker and a phenotype in the germplasm collection tested. Such associations occur if the gene being tested actually causes the phenotypic differences or if there is linkage (gametic phase)

disequilibrium between the gene being tested and the gene(s) causing the phenotypic differences. We might be content to find markers linked to causal QTLs, but the problem is that linkage disequilibrium does not actually imply linkage (which is why I will refer to it as gametic phase disequilibrium). Gametic phase disequilibrium is the nonrandom association of alleles at different loci. It is measured as the difference between observed and expected allele pair frequencies. For example, gametic disequilibrium between alleles a and b at loci A and B, respectively, is measured as: D ab = p ab p a p b, Where p ab is the frequency of the ab gamete or haplotype, and p a and p b are the allele frequencies of a and b, respectively. Gametic phase disequilibrium is reduced by random mating, but can be increased by population subdivision, recent population hybridization, and mutation. It can be maintained by physical linkage and selection on epistatically interacting loci. So, genes that are tightly linked are more likely to be in gametic disequilibrium, but even tightly linked genes may be in gametic equilibrium if there has been sufficient history of recombination between them. Conversely, genes on different chromosomes can be in gametic disequilibrium due to selection or population structure. Do you expect gametic disequilibrium to be more extensive in maize or in wheat? QTL mapping proceeds by artificially creating populations in which the level of linkage disequilibrium is solely a function of recombination frequency. In QTL mapping, therefore, a gene should be associated with a phenotype only if the gene is linked to an underlying QTL. In germplasm collections, breeding populations, or natural populations, however, gametic disequilibrium can occur due to linkage or population structure. Therefore, to perform an association analysis in such populations requires separating the effects of population structure from linkage. This can be done by first genotyping the population under study with a set of random markers, like SSRs, to characterize the relationships among lines in the population. For example, when this was done in a sample of 260 maize lines collected from around the world, they were found to group primarily into three subpopulations: Stiff Stalk, non-stiff Stalk, and Tropical/Subtropical, which correspond to the major heterotic groups recognized by corn breeders (Liu et al., 2003, Genetics 165:2117). By assigning each line a probability that it belongs to one of the three groups, most of the effects of population structure can be accounted for. Then, the effects of markers or candidate genes can be tested while using the sub-population identity probabilities as cofactors in the analysis model: Y = Xβ + Sα + Qv + e, where:

Xβ accounts for environment, block effects, etc., S are indicators of which candidate gene allele each line carries, α are candidate gene or marker effects, Q are the columns assigning the probability that each line belongs to each subpopulation, v are the effects of subpopulations, and e are residual error effects. More recently, Yu et al. (in press), have extended this model to account for pairwise genetic relationships among all of the individuals in the study. This allows finer-scale correction of genetic relationships among the lines in the study, because even within a subpopulation, there will be differences in how closely or distantly related the individuals are. The random genetic marker information can be used both to assign lines to subpopulations (the Q matrix) and also to estimate pair-wise relationships between individuals (the K matrix): Y = Xβ + Sα + Qv + Zu + e, Where the model is the same as above, with the addition of: Z, which indicates the genotype, and u, which indicates the genetic background effect, where the variance-covariance matrix of genetic background relationships is equal to KV g (where V g is the genetic background variance). With these models, the effect of the gene being tested is effectively separated from population structure effects. If a significant effect is observed, one then must determine if the gene being tested is the actual gene causing the phenotypic effect, or if it is linked to some other gene(s) causing the phenotypic difference. To separate the effects of the gene itself from linked genes, one must carefully study the extent of gametic phase disequilibrium in the population being studied and in the genome region tested. These last two points are key gametic disequilibrium depends strongly on the sample of genotypes tested and on the genome region. For example, in a diverse sample of maize inbreds, disequilibrium tends to rapidly decrease along the chromosomes (Remington et al. 2001 PNAS 98:11479), such that a specific gene causing effects on flowering time could be identified with association analysis (Thornsberry et al.,2001 Nat. Genet. 28:286). In contrast, in a highly selected, elite group of inbred lines from a private company, substantial linkage disequilibrium was found to extend up to 100 kb (Rafalski, 2002 Curr. Op. Plant Biol. 5:94), presumably due to selection. Furthermore, even in the diverse maize line set, disequilibrium was more extensive in some genome regions (Remington et al., 2001). With this in mind, the association mapping strategy needs to take account of how extensive gametic phase disequilibrium is in the target population. If disequilibrium is extensive, the resolution of the analysis will be reduced, but, on the other hand, one does not need to test the causal gene itself; instead, one may identify QTL using marker loci. In contrast, if disequilibrium is limited, one may have the resolution needed to identify

the causal gene affecting a trait, but it also means that random marker genes are not likely to be associated with the trait. Therefore, if one does not have a good set of candidate genes that are likely to be involved in the trait, association analysis with limited disequilibrium is probably not a good idea.