The principles of QTL analysis (a minimal mathematics approach)

Similar documents
QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Review of statistical methods for QTL mapping in experimental crosses

Gene Mapping in Natural Plant Populations Guilt by Association

2014 Pearson Education, Inc. Mapping Gene Linkage

Genetics of dairy production

Mapping and Mapping Populations

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Trudy F C Mackay, Department of Genetics, North Carolina State University, Raleigh NC , USA.

GENETICS - CLUTCH CH.20 QUANTITATIVE GENETICS.

Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits

Using Mapmaker/QTL for QTL mapping

Genetics or Genomics?

Identifying Genes Underlying QTLs

Why do we need statistics to study genetics and evolution?

More Precise QTL Mapping Using STAIRS

Quantitative Genetics

More QTL for owering time revealed by substitution lines in Brassica oleracea

Estimation of Genetic Recombination Frequency with the Help of Logarithm Of Odds (LOD) Method

Human linkage analysis. fundamental concepts

7.03 Problem Set 2 Due before 5 PM on Friday, September 29 Hand in answers in recitation section or in the box outside of

QTL Mapping Using Multiple Markers Simultaneously

Human linkage analysis. fundamental concepts

Genetics Lecture Notes Lectures 6 9

Chapter 14. Mendel and the Gene Idea

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Quantitative Genetics, Genetical Genomics, and Plant Improvement

Computational Genomics

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?

Chapter 9. Gene Interactions. As we learned in Chapter 3, Mendel reported that the pairs of loci he observed segregated independently

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

Computational Workflows for Genome-Wide Association Study: I

Questions/Comments/Concerns/Complaints

Observing Patterns In Inherited Traits

Mendel and the Gene Idea

Conifer Translational Genomics Network Coordinated Agricultural Project

Permutation Tests for Multiple Loci Affecting a Quantitative Character

Experimental Design and Sample Size Requirement for QTL Mapping

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

Observing Patterns in Inherited Traits. Chapter 11

Mendelian & Non Mendelian Genetics. Copy Dr. M. A. Fouad

Construction of a genetic map, mapping of major genes, and QTL analysis

Content Objectives Write these down!

Association studies (Linkage disequilibrium)

Marker-Assisted Selection for Quantitative Traits

Introduction to Plant Genetics Spring 2000

Implementing direct and indirect markers.

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study

Ch. 14 Mendel and the Gene Idea

-Genes on the same chromosome are called linked. Human -23 pairs of chromosomes, ~35,000 different genes expressed.

Genetics & The Work of Mendel

Introduction to PLABQTL

Midterm#1 comments#2. Overview- chapter 6. Crossing-over

A simple and rapid method for calculating identity-by-descent matrices using multiple markers

Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL Version 1.1: A Tutorial and Reference Manual

I.1 The Principle: Identification and Application of Molecular Markers

Answers to additional linkage problems.

Chapter 14: Mendel and the Gene Idea

Application of MAS in French dairy cattle. Guillaume F., Fritz S., Boichard D., Druet T.

LECTURE 5: LINKAGE AND GENETIC MAPPING

Genetics & The Work of Mendel

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS

Introduction to Quantitative Trait Locus (QTL) Mapping. Summer Institute in Statistical Genetics

LD Mapping and the Coalescent

SAMPLE MIDTERM QUESTIONS (Prof. Schoen s lectures) Use the information below to answer the next two questions:

Introduction to Linkage. Introduction to Linkage

Let s call the recessive allele r and the dominant allele R. The allele and genotype frequencies in the next generation are:

Lecture #15 10/8 Dr. Wormington

HCS806 Summer 2010 Methods in Plant Biology: Breeding with Molecular Markers

Use of marker information in PIGBLUP v5.20

Introduction to quantitative genetics

HCS806 Summer 2010 Methods in Plant Biology: Breeding with Molecular Markers

CHAPTER 10: Patterns of Inheritance

Understandings, Applications and Skills (This is what you maybe assessed on)

Course Announcements

Dr. Mallery Biology Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS

THE STUDY OF GENETICS is extremely

Introduction to Quantitative Genetics

3. A form of a gene that is only expressed in the absence of a dominant alternative is:

Ch. 14 Reminder: Unlinked Genes & Independent Assortment. 1. Cross: F1 dihybrid test cross: DO the Punnett Square

Optimal selection on two quantitative trait loci with linkage

EFFICIENT DESIGNS FOR FINE-MAPPING OF QUANTITATIVE TRAIT LOCI USING LINKAGE DISEQUILIBRIUM AND LINKAGE

Genetics Effective Use of New and Existing Methods

Genetics. Ms. Gunjan M. Chaudhari

Would expect variation to disappear Variation in traits persists (Example: freckles show up in unfreckled parents offspring!)

Video Tutorial 9.1: Determining the map distance between genes

Chapter 9. Objectives. Table of Contents. Gregor Mendel. Gregor Mendel, continued. Section 1 Mendel s Legacy. Section 2 Genetic Crosses

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

Efficiency of selective genotyping for genetic analysis of complex traits and potential applications in crop improvement

Introduction to Quantitative Genomics / Genetics

Conifer Translational Genomics Network Coordinated Agricultural Project

Unit 3: Sustainability and Interdependence

QUANTITATIVE trait loci (QTL) mapping studies have

Chapter 14. Mendel and the Gene Idea

PUBH 8445: Lecture 1. Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota

Relaxed Significance Criteria for Linkage Analysis

Transcription:

Journal of Experimental Botany, Vol. 49, No. 37, pp. 69 63, October 998 The principles of QTL analysis (a minimal mathematics approach) M.J. Kearsey Plant Genetics Group, School of Biological Sciences, The University of Birmingham, Birmingham B5 TT, UK Received 6 April 998; Accepted June 998 Abstract were several such genes segregating in a Mendelian fashion in any given population and their effects were approxi- The combination of molecular marker and trait data to mately additive ( Kearsey and Pooni, 996). explore the individual genes concerned with quantitat- It is difficult to define a quantitative trait precisely. The ive traits, QTL analysis, has become an important tool best that can be said is that such a trait appears to show to allow biologists to dissect the genetics of complex a continuous range of variation in a population, which is characters. However, the mathematical and statistical more or less normally distributed. There are no obvious techniques involved have deterred many from underdiscontinuities in the distribution as might be expected of standing what the methods achieve and appreciating a classical, single gene trait, such as the 55 distribution their strengths and weaknesses. of genotypes and phenotypes in an F. Such qualitative This paper is designed to give a non-mathematical genes have a large effect on the phenotype compared to explanation of the principles underlying these anathe environment and, dominance apart, genotypes have lyses, to discuss their potential and to provide an introrecognizably different phenotypes. Very often one of the duction to the techniques used in the subsequent alleles is non-functional or very dysfunctional, which papers in this series of articles based on the SEB results in the clear phenotype. symposium. However, allelic differences may occur in structural or regulatory genes which alter the genes action slightly and Key words: Gene mapping, QTL analysis, quantitative so produce much smaller phenotypic effects. This type of genetics. allelic variation is what is assumed to underlie quantitative variation ( Kearsey and Pooni, 996). Thus, yield in Introduction cereals is the result of the combined effects of many genes, from those which control grain and tiller number, through Although most of the advances in genetics over the last those that affect photosynthesis and metabolism, to genes century have been concerned with structural variation in controlling root development and germination time. It is single, so-called major genes, much of the natural vari- not difficult to imagine that minor allelic variants exist at ation observed in our own species and in the crops, many of these loci resulting in a wide range of yields domestic animals and other populations that are studied, when assembled in different combinations in a population. are due to much more minor genetic changes in many Although each gene is segregating in a standard genes. QTL analysis is the phrase used currently to study Mendelian manner, the overall effect of all the genes is this genetic variation, to locate the genes responsible and to produce a wide range of phenotypes and this variation to explore their effects and interactions. is further blurred by differences in the environment giving QTL is the acronym for Quantitative Trait Loci, genes the Normal range of variation. This does not mean that which underlie quantitative traits (Gelderman, 975). some of the genes involved might not also be genes whose Before the discovery of molecular markers they were mutant effects are well known, nor does it mean that known as polygenes (Mather, 949). Little was known some of the larger, qualitative allelic differences are not about what these genes were or how they controlled the also present, but concealed in the other variation. It is traits apart from the fact that for any given trait there also possible to obtain a continuous phenotypic distribu- Fax: +44 44 5463/595. E-mail: m.j.kearsey@bham.ac.uk Oxford University Press 998 Downloaded from https://academic.oup.com/jxb/article-abstract/49/37/69/44590 on 3 January 08

60 Kearsey tion with very few QTL, given a modest amount of and map 0 to 50 segregating markers per chromosome. environmental variation. Most will be in non-coding regions and will not affect One of the aims of QTL analysis is to explore the any trait directly, but some at least will be linked to QTL individual QTL and discover more of their action, interaction which do affect the trait of interest. QTL analysis depends and precise location. Because they are also very on the fact that where such linkage occurs, the marker important in agriculture and medicine, there are major locus and the QTL will not segregate independently and practical reasons for knowing more about them. so differences in those marker genotypes will be associated with different trait phenotypes. Situations where genes Background information about quantitative traits fail to segregate independently are said to display linkage disequilibrium. The variation, V, among individuals in a population Almost any type of population is amenable to such P such as an F for any trait can be easily measured. It is analysis. The most useful are those derived from a cross caused by genetic and environmental components, so between two inbred lines (F, backross, recombinant V =V +V. It is relatively simple to devise experiments, inbred lines, doubled haploid lines) because the marker- P G E involving comparisons between parents and offspring or QTL linkages in the F cause the derived populations to with various types of family, to estimate V and V be in linkage disequilibrium. With natural populations, G E ( Falconer and Mackay, 996; Kearsey and Pooni, 996). such as man, farm animals and tree species, the linkage For example, if a family of inbred individuals were raised associations have to be explored within families. This is in the experiment, the variation between them would be because a consistent association between QTL and marker V alone. Hence V could be estimated as V V. The genotype will not exist across the species except in the E G P E proportion of V arising from genetical causes is the unlikely situation that a given marker is completely linked P heritability of the trait in that population, h=v /V. to the QTL. This account will concentrate on the analysis G P For most traits of economic or medical interest, heritabilities are characteristically less than 50%, often much less, are analogous in principle but statistically more complex. of populations derived from an F ; the other situations so most of the phenotypic variation among individuals is If the F is heterozygous for a large number of molecu- environmental. lar markers, M,M,... M, etc. these and the trait can i It is possible to estimate the combined effects of all the be scored in individuals of the F or other derived QTL to the variation. The genetical variation, V, is a population, and the markers mapped. For any particular G function of Sa, where a is half the difference in phenotype marker locus, M, the average trait score of each of the i between alternative homozygotes of a QTL. For example, marker genotypes can be calculated, i.e. of M M, i i if there are two allelic forms of a QTL in a population, M M and M M, (Table ). If this marker is on a i i i i Q+ and Q, with Q+ causing a greater phenotype, then different chromosome to any QTL, then the QTL alleles a=(q+q+ Q Q )/. If one was to perform divergent will be segregating completely independently of the selection for the trait over several generations, one would marker, i.e. M M will occur with all possible QTL i i produce high lines with Q+ alleles at each QTL and low genotypes. So will the other two marker genotypes and lines with all the Q alleles. The difference between these hence the average phenotype of all three will be the same lines would be Sa. So the combined effects of all QTL and intermediate, i.e. about the population mean (Sa and Sa) can be estimated, but not their individual (Table a). If the marker locus is on the same chromosome and close to a QTL, then M M homozygotes will i i effects. The combination of trait data and molecular markers enables these individual effects to be identified mostly be Q+Q+ whilst the M M will mostly be Q Q, i i and forms the next step of QTL analysis. and so they will differ in phenotype (Table b). The size of this difference will depend on the effect of that QTL, Principles a, and how close the marker is to the QTL; the closer they are, the greater the difference (Fig. ). The difference The basic problem with studying a quantitative trait has between the genotypes will be maximal and equal to a always been that the phenotype of a given genotype tells if the marker and QTL are so close that they do not us little about the genotype itself; two plants could be recombine, i.e. M and the QTL cosegregate. i m high but have very different genotypes. The discovery Figure shows some actual data to illustrate how the of extensive and easily recognizable molecular variation difference between the marker means varies with proxim- has opened up the possibility of studying individual QTL ity to a QTL. The actual marker differences are subject ( Lander and Botstein, 989). to error variation of course, but one can see by eye that The principle is simple. Molecular markers give unam- they peak at 37 cm, so suggesting a single QTL, and that biguous, single site genetic differences that can easily be the effect of the QTL, a, is about 3 d. It is important to scored and mapped in most segregating populations. It is note that all the markers along the chromosome show not difficult in populations of most species to identify some effect of the QTL, even those well separated from Downloaded from https://academic.oup.com/jxb/article-abstract/49/37/69/44590 on 3 January 08

QTL analysis 6 Table. Relationship between marker genotype and mean trait trait value (y) onto the marker genotype (x) ( Table ). If value (a) marker and QTL on different chromosomes (unlinked), the marker and QTL are on different chromosomes there (b) marker and QTL on same chromosome (linked) will be no regression, while if they are on the same (a) F chromosome, the regression will be maximal at those M Q+ markers closest to the QTL, mirroring the shape of Fig.. M Q Another approach is to attempt to use regression analysis Frequencies of genotypes in F to fit a line to the combined marker data for a given chromosome as has been done in Fig.. This will locate Marker genotype Q+Q+ Q+Q Q Q Mean trait score the most likely QTL position, estimate its effect, a, test (x) (y) that it is significant and test if the single QTL adequately M M B D B Intermediate explains the data for that chromosome ( Kearsey and M M B D B Intermediate Hyne, 994). M M B D B Intermediate The most commonly used analytical approaches explore Difference between trait scores of M M and M M zero. the interval between pairs of markers for the presence of Conclusion: No relationship between trait score (y) and marker QTL. Hence they are known as interval mapping techgenotype (x). niques (Lander and Botstein, 989). They essentially look (a) F at the trait information from each adjacent pair of marker M Q+ loci and use this to infer the likelihood of a QTL being at any given position between them. For example, in M Q Fig., the markers at positions 37 and 55 cm have the Frequencies of genotypes in F highest trait scores, so the QTL is likely to be between them. Had both marker values been the same, then the Marker genotype Q+Q+ Q+Q Q Q Mean trait score (x) (y) most likely QTL position would be midway between them, whilst if the QTL was closer to the left hand M M Most Few Rare High marker, then that marker would have the higher score, M M Few Most Few Intermediate M M Rare Few Most Low as indeed it does. By calculating the likelihood of a QTL across all intervals (compared to getting that result by Difference between trait scores of M M and M M large. chance alone) gives a likelihood ratio (or LOD) profile. Conclusion: Strong relationship between trait score (y) and marker genotype (x). A similar result can be achieved by regression (Haley and Knott, 99), where the profile of probability associated it. This is because there are few crossovers on a chromosome with the F test for the regression ( pf) is used instead of and a high proportion of chromosomes emerge LOD. Where the LOD or pf exceed some significance from meiosis without being involved in any crossover. threshold indicates the likely location of the QTL and Hence a single QTL will always show some association provides information on its confidence interval (Churchill with all linked markers. and Doerge, 994; Mangin et al., 994). In the example in Fig., the estimated additive effect These techniques can be elaborated in various ways to of a=3 d can be compared with the estimate of Sa improve their precision and reliability (Jansen, 993; obtained from this cross (43.3), 3/43.3, which shows that Jansen and Stam, 994). Thus every time a QTL is % of the genetic variation has been explained by this identified, its effect can be removed from the error, so one QTL. Similar analyses of other chromosomes will increasing the precision of future tests. If all QTL are identify some of the remaining QTL. identified, only V should remain. Parameters can be built E Although the principle of QTL analysis has been illus- into the models to allow for environmental effects such trated here without resort to statistics, in practice, one as sites and years, or sex effects in animals. needs statistical methods to identify the most likely QTL QTL analysis of humans and other outbreeding populations positions and effects, to test their significance and to involves the additional problem that each individual indicate their reliability. This subject has exercised the family has to be handled separately and the data combined. minds of many statisticians and a variety of approaches A particular pair of parents may represent the designed to increase precision and cope with more com- equivalent to the parents of an F or backcross with plex data sets have been devised and statistical software respect to a particular marker locus and a QTL and hence developed. the methods discussed above can be applied to that The simplest approach is to use a t-test or ANOVA family. However, the family is very small and so marker to test if the differences between the marker means are QTL data from many families have to be combined. significant for the trait. This does not locate the QTL, Moreover, the linkage phase in the parents, i.e. whether but simply confirms that the eyeballed location indicates the marker gene and QTL are in coupling or repulsion, a real effect. An analogous approach is to regress the is not known and will vary from family to family and Downloaded from https://academic.oup.com/jxb/article-abstract/49/37/69/44590 on 3 January 08

6 Kearsey Fig.. Relationship between difference between means of marker homozygotes (M i M i M i M i ) and position on chromosome with a QTL of a= unit at 50 cm. (i) Expected values. (ii) Possible observed values with six markers unevenly spaced along the chromosome. will have to be deduced from the data. The approaches are therefore similar, but more elaborate and less precise. The major problem associated with all QTL analyses is that the individual QTL effects are small. As stated above, heritabilities for most traits are generally less than 50%, so that the heritability associated with individual QTL is a small fraction of this. The more QTL there are in the population, the smaller their individual contribution and the more difficult they are to detect. It thus follows that only the larger QTL are ever detected and this leads to the biased impression that there are few QTL and they have large effects. These low individual QTL heritabilities also cause the estimates of QTL location to have large confidence intervals ( Hyne et al., 995). Thus, although the analysis of a particular set of individuals in a popula- tion suggests that the QTL is at x cm on a given chromosome, its true position may well be anywhere within a range ±0 0 cm from this, i.e. over quite a large region of the chromosome. Unless the QTL effect is large and the environmental variation is greatly reduced by replication, it is difficult to reduce the confidence interval to less than about 0 cm. Such accuracy is not very helpful for positional cloning, but may be perfectly adequate for Marker Assisted Selection, MAS. It is a popular belief that ever denser marker maps help to resolve this problem, but beyond a density of about one marker per 0 cm, there is very little gain. By far the most important factor is the number of genotypes tested, but there are practical limits to this, particularly in an agricultural context. Downloaded from https://academic.oup.com/jxb/article-abstract/49/37/69/44590 on 3 January 08

QTL analysis 63 Fig.. Observed data (cf. Fig. ) for flowering time for chromosome 9 of Brassica oleracea, suggesting the position of a single QTL, a=3 d, at 37 cm. Best fitting line is also shown. References Churchill GA, Doerge RW. 994. Empirical threshold values for quantitative trait mapping. Genetics 38, 963 7. Falconer DS, Mackay TFC. 996. Introduction to quantitative genetics, 4th edn. London: Longman. Gelderman H. 975. Investigation on inheritance of quantitative characters in animals by gene markers. I. Methods. Theoretical and Applied Genetics 46, 300 9. Haley CS, Knott SA. 99. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 35 4. Hyne V, Kearsey MJ, Pike DJ, Snape JW. 995. QTL analysis: unreliability and bias in estimation procedures. Molecular Fig. 3. Observed data (as Fig. ) but illustrating two QTL in dispersion; Breeding, 73 8. one at 46 cm with a positive effect of 5 d and a second at 78 cm with Jansen RC. 993. Interval mapping of multiple quantitative a decreasing effect of 5 d. The solid line shows the expected marker trait loci. Genetics 35, 05. means. The close proximity (3 cm) of the QTL in dispersion reduces the observed marker values because they are cancelling each others Jansen RC, Stam P. 994. High resolution of quantitative traits effect, i.e. the QTL of 5 d at 46 cm appears as a peak of just d. into multiple loci via interval mapping. Genetics 36, 447 55. Kearsey MJ, Hyne V. 994. QTL analysis: a simple marker regression approach. Theoretical and Applied Genetics 89, 698 70. The relatively large size of the confidence intervals Kearsey MJ, Pooni HS. 996. The genetical analysis of makes it more difficult to distinguish two QTL on a quantitative traits. London: Chapman and Hall. chromosome unless they are far apart. Figure 3 illustrates Kearsey MJ, Farquhar AGL. 998. QTL analysis in plants: the effects on marker means of two QTL in dispersion. where are we now? Heredity 80, 37 4. There is a characteristic pattern of changes in marker Lagercrantz U, Putterill J, Coupland G, Lydiate D. 996. Comparative mapping in Arabidopsis and Brassica, fine scale means from positive to negative along the chromosome, collinearity and congruence of genes controlling flowering but one QTL tends to reduce the effects of the other so time. The Plant Journal 9, 3 0. making the detection of either more difficult. Had the Lander ES, Botstein D. 989. Mapping Mendelian factors two QTL been in association, their individual effects underlying quantitative traits using RFLP linkage maps. could combine to give the appearance of a false, ghost Genetics, 85 99. Mangin B, Goffinet B, Rebai A. 994. Constructing confidence QTL somewhere between them. intervals for QTL location. Genetics 38, 30 8. Despite these problems, QTL analyses of segregating Mather K. 949. Biometrical genetics, st edn. London: populations have been very useful in identifying major Methuen. QTL which may suggest candidate loci (Osborn et al., Osborn TC, Kole C, Parkin IAP, Sharpe AG, Kuiper M, Lydiate 997; Lagercrantz et al., 996) and permit accelerated DJ, Trick M. 997. Comparison of flowering time genes in Brassica rapa, B. napus and Arabidopsis thaliana. Genetics MAS. For further reviews of QTL analysis, the reader is referred to the following: Tanksley (993) and Kearsey and Farquhar (998). 56, 3 9. Tanksley SD. 993. Mapping polygenes. Annual Review of Genetics 7, 05 33. Downloaded from https://academic.oup.com/jxb/article-abstract/49/37/69/44590 on 3 January 08