Marker-Assisted Selection for Quantitative Traits

Marker-Assisted Selection for Quantitative Traits Readings: Bernardo, R. 2001. What if we knew all the genes for a quantitative trait in hybrid crops? Crop Sci. 41:1-4. Eathington, S.R., J.W. Dudley, and G.K. Rufener II. 1997. Usefulness of marker-qtl associations in early generation selection. Crop Sci. 37:1686-1693. Stuber, C.W., M. Polacco, and M.L. Senior. 1999. Synergy of empirical breeding, markerassisted selection, and genomics to increase crop yield potential. Crop Sci. 39:1571-1583. Tanksley, S.D. and S. R. McCouch. 1997. Seed banks and molecular maps: Unlocking genetic potential from the wild. Science 277:1063-1066. Assigning Breeding Values to Lines for Marker-Assisted Selection Much of the practical interest in identifying and mapping QTLs is to use markers to aid selection for those QTLs. How marker genotype data can be used to select particular lines or plants or families is the next logical question. Obviously, the breeding goal will impact how these data are used: for backcrossing small genomic segments from a donor parent into an elite line, evaluating graphical genotypes of backcross progeny may be the primary method. For choosing parents with which to initiate new breeding populations ( forward or offensive breeding), not only genotypic value, but complementation of parental genotypes and phenotypes should be considered. Net Molecular Scores Lande and Thompson (1990) refer to the breeding value of an individual based on its DNA marker genotype as its net molecular score. The net molecular score of an individual or a line is quite easy to calculate once the individuals have been genotyped and once the QTLs have been identified and their effects estimated. For example, assume that three QTLs have been identified in an F2 population using single-factor ANOVA. The effects of each QTL have been estimated, and the additive effect for each locus (a) was defined as half of the mean of the AA class minus the mean of the BB class, as we showed previously. The effects estimated at the three loci were +10, +5, and -10, respectively. Breeding value of an individual based on one marker locus is +a for the AA genotypes, 0 for the AB heterozygotes (because that plant will pass on to its progeny favorable and unfavorable alleles with equal frequency), and -a for the BB genotypes. Given five F2 plants with the following genotypes at each of the three marker loci defining the QTL, their net molecular scores can be calculated as the sum of their breeding values calculated at each QTL marker locus (Table 1):

Table 1. Net molecular score is the sum of individual marker locus breeding values. Plant Locus 1 Locus 2 Locus 3 Net molecular score Genotype Value Genotype Value Genotype Value 1 AA +10 AA +5 AA -10 +5 2 AA +10 AA +5 BB +10 +25 3 AB 0 BB -5 AB 0-5 4 AB 0 BB -5 AA -10-15 5 BB -10 AA +5 AB 0-5 Based on markers alone, we would select individual 2 as the best. Combining Net Molecular Scores and Phenotypic Values In reality, we often have the phenotypic data as well as the genotypic data from each individual or line. Which data should we use for selection, or is there some way to use this information together? Lande and Thompson (1990) suggested the use of a selection index to combine the genotypic and phenotypic data together (based on the same theory used to develop optimum phenotypic selection indices). The selection index is: I = b P p + b m m where p is the phenotypic deviation of the individual m is the net molecular score of the individual b P is the weight for the phenotypic information b m is the weight for the genotypic information Arbitrarily, we can set the weight on the phenotypic data to one, and the optimum weight on the genotypic data is then: b m = [(1/h 2 ) - 1]/(1-p) where h 2 is the heritability of the trait and unit under selection (if individuals are being selected, use heritability on an individual basis; if families are lines are being selected, use the heritability of line means) p is the proportion of additive genetic variance explained by the QTLs detected (in reality by the markers used to detect the QTLs). The heritability and proportion of additive variance explained by the markers are estimated from the same experiment which was used to detect QTLs. The index gives more weight to the net molecular score relative to the phenotypic data as the heritability decreases and as the proportion of additive genetic variance explained by the QTLs increases. The index weight of the molecular

score under different levels of heritability and p are given in Table 2: Table 2. Index weight for net molecular score relative to a weight of 1 for phenotypic data at different values of h 2 and proportion of additive genetic variance explained by markers (p). h 2.1.25.5.75 1.1 10 12 18 36 total weight.25 3.33 4 6 12 total weight.5 1.11 1.33 2 4 total weight.75 0.37 0.44 0.67 1.33 total weight 1 0 0 0 0 either From this table, you can see that as heritability increases, particularly as it becomes greater than 50%, more weight will tend be given to the phenotypic data. At lower heritabilities, the marker information is more important. If the markers can account for most of the additive variance, then they will be given more weight, even if the trait has a high heritability. Selection for multiple traits with marker information What happens when you want to select to improve multiple traits? Basically, the situation becomes quite complicated. Assume that you have phenotypically and genotypically evaluated a population for a number of characters, each of which you want to improve. You can perform QTL-mapping for each trait separately (or together, using the theory developed by Jiang and Zeng (1995) and develop a net molecular score for each individual or line separately. Then the question is, how do we combine the net molecular scores of the different traits into a multitrait net molecular score, and how do we combine this with the phenotypic data from the multiple traits? Again, the optimum solution comes from index selection theory and was worked out by Lande and Thompson (1990). Unfortunately, the result is complicated! The optimum index is: p I = b P1 p 1 + b P2 p 2 +... + b Pn p n + b m1 m 1 + b m2 m 2 +... + b mn m n where p i is the individual s deviation from the mean for trait i m i is the individual s deviation from the mean for trait i b Pi is the weight for trait i phenotypic data b mi is the weight for trait i net molecular score This is written in indecipherable matrix form as: I = b P T p + b m T m where the weights and phenotypic and net molecular scores are arranged in vectors. The solution for the different phenotypic and genotypic index weights are given by:

b P = (P - M) -1 (G - M)d b m = [I - (P - M) -1 (G - M)d] where P is the phenotypic variance-covariance matrix for the different traits M is the phenotypic variance-covariance matrix for the different net molecular scores d is the vector of economic weights for the different traits (chosen by the breeder) A very simple example may help to explain the components of these matrices and the method of solution for the index weights. Suppose that we are interested in two traits, X and Y, which are slightly negatively correlated (both genetically and phenotypically). QTL mapping was performed on the two traits in the same population, with the result that 3 QTLs were detected for trait X and 3 QTLs were detected for trait Y, one of the QTLs being in common between the two traits, and all of the QTLs unlinked. M 1 has an effect of +5 on X and 0 on Y M 2 has an effect of -4 on X and 0 on Y M 3 has an effect of +3 on X and -4 on Y M 4 has an effect of +5 on X and 0 on Y M 5 has an effect of -3 on X and 0 on Y The heritability of X is 50% with a phenotypic variance of σ 2 Px = 80, and an additive variance of σ 2 Ax = 40. The amount of additive variance explained by the markers can be computed as the sum of the variances of the different markers, since the markers are unlinked and therefore have no covariance: σ 2 mx = σ 2 m1x + σ 2 m2x + σ 2 m3x = (1/2)[5 2 +(-4) 2 +3 2 ] = 25 So the proportion of additive variance explained by the markers is 25/40 = p = 0.625. If index selection were performed just on trait X, the index would be: I = p X + {[(1/0.5) - 1]/(0.375)}m X = p X + 2.67m The heritability of Y is 30% with a phenotypic variance of 100. The markers explain σ 2 mx = 25, or p = 0.85 of the additive variance. So, if selection were practiced only on trait Y, the index would be: I = p Y + {[(1/0.3) - 1]/(0.15)}m Y = p Y + 15.56m Now, assume that the economic weights on the two traits are considered equal, so:

Assume that the phenotypic covariance of X and Y is -20, so the phenotypic variance-covariance matrix is: Assume that the genetic covariance of X and Y is -10, so the genetic variance-covariance matrix is: The net molecular score variance-covariance matrix has the variances due to the net molecular scores (σ 2 mx, σ 2 my) on the diagonals and the covariances between the markers involved in the two net molecular scores. There are no covariances between M 1 and M 2 on the one hand, and M 4 and M 5 on the other, because they neither have pleiotropic effects nor are they linked (the only possible causes of genetic correlations). M 3 has effects on both traits, so it causes a covariance between the two net molecular scores of (1/2)(3)(-4) = -6. So the variance-covariance matrix associated with the net molecular scores is: With all of this information in hand, the index weights for phenotypic data are solved as: The index weights for net molecular scores are:

So, setting the index weight on the phenotypic data for trait Y arbitrarily as 1, the index becomes: I = 3.83p x + p y + 15.43m x + 19.91m y The index weight on marker scores is greater than that for either phenotypic score, which is not surprising, given the previous results selecting for each trait independently. What are surprising are the relative weights of the phenotypic and marker data across the two traits - they are dependent on so many parameters that they simply can t be guessed at unless one works through the equations. As is true with all index selection programs, the ability to obtain proper weightings on the different traits depends on having good phenotypic data from which to estimate the genetic and phenotypic variances and covariances. Two examples of actual MAS programs combining marker scores and phenotypic data for multiple traits have used simpler methods to obtain index values for each line under selection. In one case, the multiple trait index was developed based on traditional index selection theory and QTLs affecting the index were mapped, and the net molecular score and phenotypic score for the index were combined as for a single trait (Edwards and Johnson, 1994). In the other case, QTLs were mapped for each trait separately, then a net molecular score for the index was obtained by summing the net molecular scores for each trait weighted by their economic value. Again, this allowed combining the net molecular scores for the index and the phenotypic values for the index to be combined as for a single trait (Eathington et al., 1997). Selection for optimum genotype The theory presented above is valid for selection in a classical recurrent selection setting, but is it relevant to typical commercial plant breeding programs? The goal of recurrent selection is to improve the mean value of a population of individuals. This is the case with outcrossing species that are intolerant of inbreeding, such as many forage species. The goal of virtually all commercial self-pollinated crop breeding programs, however, is to develop the best possible single inbred genotype from a population; and the goal of virtually all commercial hybrid crop breeding programs is to develop the best possible hybrid from a cross of two inbred genotypes. In both of these latter cases, selection is not practiced in the context of continual intermating, but more typically in the context of continual inbreeding and occasional intermating. Is there any theory available for marker-assisted selection in the context of pedigree-type breeding programs? In fact, not much, but van Berloo and Stam (1998) recently proposed a marker-assisted breeding scheme for selection among recombinant inbred lines that is helpful. And there has been more empirical work done on this type of MAS than on recurrent selection-type MAS -obviously because this is the type of selection that dominates commercial plant breeding programs.

The situation proposed by van Berloo and Stam (1998) is as follows: RILs (equivalent to singleseed descent lines) are developed from a single cross, QTLs are mapped in the F 2 generation, and the goal of selection is to produce the best single genotype from crosses among RILs. The first question may be: why should we worry about intercrossing RILs, because perhaps the best possible genotype is already available in the RIL population? To answer this, assume a reasonable number of unlinked QTLs for the trait of interest, say 8. The frequency of the favorable alleles at all 8 loci are 0.5 in this population. Therefore, the frequency of a genotype homozygous for favorable alleles at all 8 loci in the inbred population is (.5) 8 = 0.0039, or roughly one in every 250 RILs. To have 95% probability of recovering at least one such line in a population of RILs, one would need to genotype n = log(0.05)/log(1-0.0039) = 767 RILs, which may be more than can easily be handled. The number would increase if the favorable alleles at two or more of the QTLs were linked in repulsion phase (because a recombination event is required to produce a gamete with both favorable alleles), or if there were more QTLs. The number would decrease if there were favorable alleles in coupling phase at two or more loci. The method proposed by van Berloo and Stam (1998) is to calculate an index for each combination of lines, based on the probability of recovering the most favorable genotype from the cross of those two lines. Thus, selection is not on individual genotypes, but rather on genotypic cross combinations. The index for each cross combination (combination index, or CI) is made based on the marker genotypes at intervals containing the target QTLs: CI = Σ i (a i @w i ) where: the summation is over all i intervals containing QTLs a i is the additive effect of the QTL in the ith interval (ignoring dominant effects for simplicity) w i is the weight given the interval based on the genotypes of the two lines at marker loci defining the ith QTL interval. The weighting scheme for each interval is based on the frequency of favorable QTL alleles that are expected in the progeny of the cross, and is described in Table 3:

Table 3. Weights for different interval combinations based on parental flanking marker genotypes (M 1 and M 2 are marker alleles defining favorable QTL allele interval). Parent 1 flanking marker genotypes Parent 2 flanking marker genotypes Interval Weight M 1 - M 2 M 1 - M 2 2 M 1 - M 2 m 1 - m 2 1 M 1 - M 2 M 1 - m 2 1 m 1 - m 2 m 1 - m 2 0 m 1 - m 2 M 1 - m 2 0 M 1 - m 2 M 1 - m 2 0 This method assumes that if both flanking marker alleles associated with the favorable QTL alleles, the QTL must be in the interval (which is not true if double-crossovers have occurred), and that if either of the favorable alleles at flanking markers are not present, then that parent is assumed to not have the favorable QTL allele (again, not true if the recombination in the interval occurred between the QTL and the marker locus with the unfavorable allele). They also make selections based solely on genotype data - no phenotypic data are used. The method proposed by van Berloo and Stam (1998) is rather limited, but the basic idea can easily be extended to more realistic situations. One can imagine modifying this scheme to take the following into account: - Interval weights could be based on the conditional probability that an interval has the favorable QTL allele given the flanking marker genotypes in the same way that is done in interval mapping theory. This should use the marker data more efficiently. - Phenotypic data can be integrated into the selections in some way, perhaps by augmenting the combination index by the mean phenotype of the two parents weighted according to selection index theory. - Selection could be initiated before the inbreeding process is completed. This seems to make more sense than to wait until the inbreeding process is completed, unless one is going to wait until the RILs are available to perform QTL mapping. In early generations, the weighting scheme would have to involve the conditional probability that heterozygous genotypes have the QTL allele of interest - but as mentioned, this theory is already available in interval mapping. In addition, one can imagine that F 2 progeny that have at least one favorable allele at all QTL intervals could be identified, and a pedigree-type approach could be used to grow large numbers of their F3 progeny. Since the probability that an F2 will be heterozygous or homozygous for the favorable allele is 0.75, the probability that an F2 is heterozygous or homozygous for the favorable alleles at 8 unlinked loci is 10%, so the probability of recovering such an F2 is much greater than the probability of recovering a completely homozygous inbred with all of the

favorable alleles. Resources could then be concentrated on evaluating the offspring of a smaller number of early generation individuals. Failing that, early generation individuals could be intermated immediately to generate new recombinant offspring, rather than waiting to intermate inbred lines. This procedure might take better advantage of the pedigree breeding methods frequently used by plant breeders. - As discussed in the QTL mapping section, QTL mapping in the F2 generation is not ideal because single-plant evaluations are generally worthless for many important traits in crop species, such as yield. In hybrid crops, such as maize, however, individual F2 plants can be topcrossed to inbred testers and selfed. The topcrosses can be evaluated in replicated yield trials to provide phenotypic data, while the selfed progeny can be maintained in a pedigree breeding program. This would allow use of marker-assisted selection in the context of early-generation testing for yield. Theoretical, Simulation and Empirical Evaluations of MAS The question remains - is MAS useful? Under what conditions will it be most useful? A number of simulation studies and a few empirical results are available that address these questions. Evaluation of marker-assisted recurrent selection As mentioned, marker-assisted recurrent selection methods are not immediately applicable to most commercial crop breeding programs. The results of evaluations of marker-assisted recurrent selection do indicate the strengths and weaknesses of marker-assisted selection in general, however. The first evaluation of marker-assisted recurrent selection by Lande and Thompson (1990) indicated that the relative efficiency (RE) of MAS (that is, index selection combining net molecular scores and phenotypic data together) compared to phenotypic selection depended greatly on the heritability of the trait under selection, and the proportion of additive variance explained by the markers (Figure 1 (Lande and Thompson, 1990)). At low heritabilities, MAS generally had much greater RE, while at heritabilities greater than 50% there seemed to be little justification for using MAS. When families rather than individuals are the units of selection, the heritability on which phenotypic selection acts is greater, so the advantage of MAS was less. For example, the maximum RE of MAS on full-sib families (assuming h 2 = 2% and the markers explained 100% of the genetic variance) was only 1.41. Edwards and Page (1994) compared marker-assisted to phenotypic recurrent selection using computer simulations, and investigated the effects of selection on single markers versus flanking markers and the effects of recombination distance between markers and QTL. Their model for marker-assisted selection did not use any phenotypic data, and assumed that all of the QTLs affecting the traits were correctly identified - no false positives, and no QTLs were missed. Even in this situation, there seemed to be little advantage to MAS when heritability was at 40%. The need for tightly-linked markers for selection was also clear - MAS had no advantage over phenotypic selection when recombination between markers and QTLs was 20%. Selection on

single markers was almost as good as selection on flanking markers if the markers were within 5% recombination frequency of the QTLs. Hospital et al. (1997) compared MAS including phenotypic evaluation to purely phenotypic recurrent selection using simulations. Their results showed again that when heritabilities were above 40%, MAS offered little advantage, unless marker-only selection could be used on an extra cycle or two per cycle of phenotypic selection. This may be the situation for breeders who can use off-season nurseries for intermating, but for which phenotypic selection is not useful. In this case, alternating cycles of combined marker- and phenotypic selection with marker-only selection provided higher rates of gain per unit of time (because more cycles of selection could be completed per unit of time) even when heritabilities were higher than 0.5. In fact, high heritabilities are even more advantageous in this situation because the QTLs can be identified more precisely, allowing more effective marker-only selection in off-season nurseries. When heritabilities were low, one problem with MAS was the high variance in response - generally MAS was superior to phenotypic selection, but among replicated trials there were some cases where it was not. In such low heritability cases, they found that using higher Type-I error thresholds improved response from MAS - because the effect of missing a QTL is worse than the effect of selecting on a marker that is not linked to a QTL. Bernardo (Bernardo, 2001) used simulations to compare the efficiency of MAS for hybrids with and without knowledge of the underlying genes. He assumed that one knew which genes control a quantitative trait, but the effect of each gene on the trait must be estimated for any one specific population. In one sense, this is a best possible case for MAS, because the genes are known, so the markers are the actual genes themselves. He found that if the number of controlling genes is low (10 genes), then knowing the genes involved in controlling the trait can add 37% efficiency to selection for optimum hybrids and 60% efficiency to selection for inbreds, when heritability was 20% (less if heritability was higher). The more stunning result is that if the trait is controlled by many genes (50 or 100), and 100% of the genes are known, then selection based on estimated genetic effects was about equal to phenotypic selection for hybrids, and 70% less effective than PS for inbreds! Thus, if there are many genes and they are all known genes, there is no gain and may be significant loss from this knowledge! To understand this result requires careful understanding of the simulation conditions - Bernardo assumed that the genes are known, but that their effects must be estimated. With many genes, many genetic effects must be estimated, and it is nearly impossible to get good estimates of genetic effects with typical or even large sample sizes (up to n = 2000 in his simulation). Obviously, one could improve the genetic estimates by increasing the sample size even more, but with even larger sample sizes, PS will also become more effective! Bernardo also assumed that selection for inbreds was conducted only in the F2 population derived from the cross of the two best lines - and once the two best lines have been identified phenotypically, the major genes have been fixed for favorable alleles, so selection is based on minor genes which have particularly poor effect estimates. A possible objection to Bernardo s study is that there may be situations in which once a gene is known to affect some trait, and once the effect of the gene can be estimated reasonably well, then no further phenotypic evaluation is necessary - if the effect estimate is good. At that point, then selection on the genotype alone can be conducted without recourse to phenotypic evaluations.

Clearly, one would not want to trust the gene effect estimates for too many generations without checking the value of the progeny phenotypically at some point, but if some generations of phenotypic evaluation can be skipped, then MAS again looks better (or at least not so bad!). Nevertheless, Bernardo s study is important because it demonstrates that MAS will be useful only under specific circumstances, and that it should be implemented only in such circumstances, rather than in any and all situations, under the assumption that it must work! Edwards and Johnson (1994) evaluated 4 cycles of marker-only based recurrent selection in two sweet corn populations. They wanted to improve an index of 34 traits, including agronomic characters and quality traits. They first developed a phenotypic index of desired gains combining all the traits, measured in a single environments, and then mapped QTLs affecting the index. Marker-based selection was then conducted at a rate of four cycles per year, based on line combinations predicted to have high probability of producing ideal offspring in a manner similar to van Berloo and Stam (1998). Results indicated that gains were made in most of the important traits under selection, except that in one population, gains in agronomic traits were offset by reductions in quality traits. Seemingly, this was due to favorable alleles linked in repulsion phase in the initial population without sufficient recombination to allow selection for favorable alleles at all loci. Evaluation of marker-assisted pedigree breeding van Berloo and Stam (1998) used computer simulation to compare their proposed MAS program against phenotypic selection in terms of developing ideal genotypes from crosses among RILs. Again, when heritability was high, in this case above 50%, there was little to no advantage to using MAS. Decreased marker density from one marker locus per 10 cm to one marker locus per 20 cm caused only a small reduction in selection response. They also investigated the effects of Type I and Type II errors in QTL detection and found that, while Type II errors (missing real QTLs) caused a reduction in response, Type I errors (selecting on false positive QTLs) had virtually no effect on response - even when there were as many false QTLs as real QTLs. The effectiveness of MAS in early-generation testing in maize was tested by Stromberg et al. (1994). In the same generation, they topcrossed 235 F2 plants from a line x population cross onto an inbred tester, selfed the F2's to derive inbred lines, and genotyped the F2's using 34 RFLP markers (which did not include 4 of 20 chromosome arms). The experimental scheme is outlined in Fig. 1 (Stromberg et al., 1994). Based on yield trials of the topcrosses, eight QTLs for yield in topcrosses were mapped and used to compute net molecular scores for each F2. Twenty F2-derived lines were selected based on net molecular score, and 20 were selected based solely on phenotypic data, and these were crossed to the same tester for evaluation. There were no significant differences between the two selected groups, and neither selected group surpassed the unselected control - so neither MAS nor phenotypic selection were effective in this case. Perhaps this was due to poor correlations between yields in the testing and evaluation environments, which would reduce the effectiveness of either selection method. This illustrates the importance of good phenotypic evaluations for both MAS and traditional breeding methods. MAS for early generation testing was re-evaluated using a similar scheme, but with 190 F2:3

lines tested in topcrosses in 4 environments, and genotyped at 157 marker loci (Eathington et al., 1997). Selection based on phenotypes only, on marker scores only, and on indices combining marker and phenotype information (Lande and Thompson, 1990) for each of three traits individually and for indices combining the three traits were compared. For the multitrait index including phenotypic and genotypic data, they weighted the net molecular scores by the economic value of the traits. They found that phenotypic selection was superior to marker-only selection, and that MAS combining molecular and phenotypic data in the early generation was just slightly better than phenotypic selection in identifying superior late-generation line crosses. The superiority of MAS combining phenotypic and genotypic data increased a bit when index selection was considered. The heritabilities of line means for each of the three traits were high (77-90%), and consequently, the weights given to net molecular scores in the genotypephenotype index were low. This supports the theoretical results discussed previously that indicate that MAS for traits with heritabilities will not be much better than phenotypic selection, on a per-cycle basis. Tanksley et al. (1996) mapped QTLs in advanced backcross populations. The objective of their method was to detect QTLs with large effects introgressed from wild tomato species into cultivated tomato. A priori, they knew that most of the alleles from the wild parent were not favorable for cultivated tomatoes. Therefore, they were not concerned about detecting all QTLs in wild x cultivated tomato crosses, they were only interested in detecting those QTLs at which the wild parent could contribute favorable alleles for yield or fruit quality (quantitative traits). Thus, they combined QTL mapping with traditional backcross breeding: wild x cultivated tomato F1 s were backcrossed several generations to the cultivated parent, while visual selection was practiced against obviously unfavorable phenotypes. Only good looking backcross lines were evaluated in replicated yield trials and genotyped. QTLs were detected for which the wild parent could contribute favorable alleles, and lines were selected based on genotypes for final backcrosses to the recurrent parent. This method was effective at improving important traits in tomato, and is probably generally useful in cases where genes with reasonable large favorable effects exist in unadapted germplasm, although direct evaluations of unadapted germplasm generally provide little indication of the likelihood of their possessing favorable alleles for elite population (Tanksley and McCouch, 1997). Summary of MAS: Success, Problems, and Challenges Optimum situations for MAS DNA markers are being rather widely used for backcross breeding programs to introgress major genes with specific effects, including transgenes, into elite backgrounds (Lee, 1995). MAS is ideally suited to such situations, as it allows direct selection for genotypes with minimum linkage drag, faster recovery of the recurrent background, and selection among juvenile plants and in environments for which the gene effects cannot be directly or easily evaluated. The utility of MAS in more general breeding programs for selection of multiple genes affecting multiple traits of importance has not been generally established. Perhaps, like any other breeding method, it

will be useful only in certain circumstances. In summary, the use QTL-marker associations in plant breeding programs will be advantageous in the following situations: 1. Traits with low heritability but for which the QTLs explain most of the genetic variance. This implies a contradiction: Trait heritability is the most important factor influencing the effectiveness of MAS. MAS seems to be most promising for traits with low heritability. But trait heritability is also of major importance for accuracy in the mapping of QTLs. Low heritability reduces the power of detecting QTLs, which is based on the correlation between phenotype and marker genotype. This could mean that for well-mapped QTLs MAS may add little to phenotypic selection, while for traits with a very low heritability the underlying QTLs cannot be identified. It is the area in between these two extremes that looks most promising for the application of MAS. (van Berloo and Stam, 1998) 2. Situations in which phenotypic selection cannot be applied in all cycles of intermating or selfing, such as when off-season nurseries are employed for line development. In these cases, MAS can be used in all generations, which can improve response to selection per unit of time, even if it is not advantageous on a per generation scale. This may make MAS more widely applicable to commercial plant breeding programs than the literature might immediately lead one to believe. This may also be an example where knowledge of gene effects can be transferred across generations in the absence of additional phenotypic evaluation, in contrast to Bernardo s (2001) simulation model that required phenotypic evaluation of gene effects anew in each generation. 3. Incorporation of rare, favorable alleles from exotic or wild species germplasm? If exotic germplasm is phenotypically poor or agronomically unadapted, yet contains favorable alleles at a low frequency that are absent from the elite cultivated gene pool, marker-assisted identification and incorporation of favorable exotic QTL alleles may be more effective than solelyphenotypically-based selection. This seems to have been true for tomato (Tanksley and McCouch, 1997). Statistical problems? We summarized our understanding of QTL mapping by noting the following statistical problems: QTL positions are identified with poor precision, type I errors are problematic due to multiple testing within a genome, and both overestimation of QTL effects and type II errors (missing QTLs with small effects) occur when sample sizes are small and phenotypic evaluations are insufficiently replicated. Which of these problems will seriously affect MAS? - Imprecision of QTL locations has only a small effect, as long as selection is performed on marker loci linked within 10 cm (van Berloo and Stam, 1998).

- Type I errors (including markers that are not actually linked to a QTL) have insignificant effect on MAS (Hospital et al., 1997; van Berloo and Stam, 1998). - Type II errors (missing real QTLs) are more serious (van Berloo and Stam, 1998). - MAS based on small populations is generally less efficient than that based on larger sample sizes (Hospital et al., 1997), but see (Edwards and Page, 1994). Practical challenges The main practical difficulty in integrating MAS into traditional, commercial plant breeding programs aimed at improving multiple traits controlled by multiple genes is the number of different populations that breeders normally handle. Marker-trait associations are populationdependent (Strauss et al., 1992) because a marker allele that is linked to a favorable QTL allele in one population may be linked to a less favorable allele in an unrelated population. Or, the QTL alleles may not be segregating in another population. Beer et al. (1997) used DNA marker profiles of a broad range of oat germplasm in attempt to infer linkages to QTL detected in a single-cross mapping population. Marker alleles associated with favorable QTL effects in the mapping population were associated equally often with favorable and unfavorable QTL effects in the oat germplasm sample. Thus, identification of QTLs in one breeding population may not help the breeder select in any other population. Can breeders afford to conduct QTL-mapping studies in every population of interest? Surely not! Some method that will allow MAS to be implemented with a minimal set of DNA markers across a broad range of breeding populations is needed. Perhaps a combination of pedigree information, DNA marker profiles of important breeding lines or other germplasm, and a limited number of well-designed QTL-mapping studies will allow effective use of marker-trait associations across a useful range of breeding populations. This may be the most important challenge for MAS in the near future.

REFERENCES Beer, S.C., W. Siripoonwiwat, L.S. O'Donoughue, E. Souza, D. Matthews, M.E. Sorrells, 1997 Associations between molecular markers and quantitative traits in an oat germplasm pool: Can we infer linkages? Journal of Quantitative Trait Loci. 3: Article 1. http://probe.nalusda.gov:8000/otherdocs/jqtl. Bernardo, R., 2001 What if we knew all the genes for a quantitative trait in hybrid crops? Crop Science. 41: 1-4. Eathington, S.R., J.W. Dudley, G.K. Rufener, II, 1997 Usefulness of marker-qtl associations in early generation selection. Crop Science. 37: 1686-1693. Edwards, M., L. Johnson, 1994 RFLPs for rapid recurrent selection. Analysis of Molecular Marker Data. American Society of Horticultural Science and Crop Science Society of America., Corvallis, OR. pp. 33-40. Edwards, M.D., N.J. Page, 1994 Evaluation of marker-assisted selection through computer simulation. Theoretical and Applied Genetics. 88: 376-382. Hospital, F., L. Moreau, F. Lacoudre, A. Charcosset, A. Gallais, 1997 More on the efficiency of marker-assisted selection. Theoretical and Applied Genetics. 95: 1181-1189. Jiang, C., Z.-B. Zeng, 1995 Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 140: 1111-1127. Lande, R., R. Thompson, 1990 Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 124: 743-756. Lee, M., 1995 DNA markers and plant breeding programs. Advances in Agronomy. 55: 265-344. Strauss, S.H., R. Lande, G. Namkoong, 1992 Limitations of molecular-marker-aided selection in forest tree breeding. Canadian Journal of Forestry Research. 22: 1050-1061. Stromberg, L.D., J.W. Dudley, G.K. Rufener, 1994 Comparing conventional early generation selection with molecular marker assisted selection in maize. Crop sci. 34: 1221-1225. Tanksley, S.D., S.R. McCouch, 1997 Seed banks and molecular maps: Unlocking genetic potential from the wild. Science. 277: 1063-1066. Tanksley, S.D., J.C. Nelson, 1996 Advanced backcross QTL analysis: A method for the simultaneous discovery and transfer of valuable QTLs from unadapted germplasm into elite breeding lines. Theor. Appl. Genet. 92: 191-203. van Berloo, R., P. Stam, 1998 Marker-assisted selection in autogamous RIL populations: A

simulation study. Theor Appl Genet. 96: 147-154.