Experimental Design and Sample Size Requirement for QTL Mapping

Similar documents
POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Human linkage analysis. fundamental concepts

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

Mapping and Mapping Populations

Quantitative Genetics

LECTURE 5: LINKAGE AND GENETIC MAPPING

Conifer Translational Genomics Network Coordinated Agricultural Project

Chapter 14: Mendel and the Gene Idea

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

1 why study multiple traits together?

-Genes on the same chromosome are called linked. Human -23 pairs of chromosomes, ~35,000 different genes expressed.

LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES

Exam 1 Answers Biology 210 Sept. 20, 2006

Multiple Traits & Microarrays

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

The principles of QTL analysis (a minimal mathematics approach)

Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL Version 1.1: A Tutorial and Reference Manual

Concepts: What are RFLPs and how do they act like genetic marker loci?

Mendel and the Gene Idea

Observing Patterns In Inherited Traits

Practical integration of genomic selection in dairy cattle breeding schemes

Observing Patterns in Inherited Traits. Chapter 11

Bio 311 Learning Objectives

Molecular Marker-Facilitated Investigations of Quantitative Trait Loci in Maize. II. Factors Influencing Yield and Its Component Traits

Plant Science 446/546. Final Examination May 16, 2002

Answers to additional linkage problems.

Computational Workflows for Genome-Wide Association Study: I

An introductory overview of the current state of statistical genetics

General aspects of genome-wide association studies

Introduction to quantitative genetics

Genomic Selection using low-density marker panels

Genomic selection in American chestnut backcross populations

Estimation of Genetics Variance Components from Composite and Hybrid Maize (Zea mays L) Hybridization

Chicken Genomics - Linkage and QTL mapping

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Course Overview. Interacting genes. Complementation. Complementation. February 15

AP BIOLOGY Population Genetics and Evolution Lab

Genetics II: Linkage and the Chromosomal Theory

Chapter 14: Mendel and the Gene Idea

Population and Community Dynamics. The Hardy-Weinberg Principle

Strategy for applying genome-wide selection in dairy cattle

Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations

Video Tutorial 9.1: Determining the map distance between genes

Genetics - Problem Drill 05: Genetic Mapping: Linkage and Recombination

Indentification and Mapping of Unknown Mutations in the Fruit Fly in Drosophila melanogaster. By Michael Tekin and Vincent Saraceno

Mapping and selection of bacterial spot resistance in complex populations. David Francis, Sung-Chur Sim, Hui Wang, Matt Robbins, Wencai Yang.

Modeling and simulation of plant breeding with applications in wheat and maize

Genome-wide association studies (GWAS) Part 1

Genetic Nomenclature. Zhiliang Hu Iowa State University, James M. Reecy Iowa State University,

Conifer Translational Genomics Network Coordinated Agricultural Project

Chapter 4 Gene Linkage and Genetic Mapping

Linkage Disequilibrium. Adele Crane & Angela Taravella

Molecular markers in plant breeding

Chapter 6. Linkage Analysis and Mapping. Three point crosses mapping strategy examples. ! Mapping human genes

Michelle Wang Department of Biology, Queen s University, Kingston, Ontario Biology 206 (2008)

Genomic Selection Using Low-Density Marker Panels

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Implementation of Genomic Selection in Pig Breeding Programs

5 Results. 5.1 AB-QTL analysis. Results Phenotypic traits

Chapter 6 Linkage and Chromosome Mapping in Eukaryotes

3I03 - Eukaryotic Genetics Repetitive DNA

A/A;b/b x a/a;b/b. The doubly heterozygous F1 progeny generally show a single phenotype, determined by the dominant alleles of the two genes.

Exact Multipoint Quantitative-Trait Linkage Analysis in Pedigrees by Variance Components

TEXAS A&M PLANT BREEDING BULLETIN

HARDY-WEINBERG EQUILIBRIUM

An introduction to genetics and molecular biology

Guilherme J. M. Rosa, Natalia de Leon and Artur J. M. Rosa

Random Allelic Variation

Plant Science 546. Final Examination May 12, Ag.Sci. Room :00am to 12:00 noon

Association Mapping in Wheat: Issues and Trends

Problem set questions from Exam 1 Unit Basic Genetic Tests, Setting up and Analyzing Crosses, and Genetic Mapping

LAB. POPULATION GENETICS. 1. Explain what is meant by a population being in Hardy-Weinberg equilibrium.

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Would expect variation to disappear Variation in traits persists (Example: freckles show up in unfreckled parents offspring!)

Methods for linkage disequilibrium mapping in crops

LS50B Problem Set #7

Quantitative trait locus analyses and the study of evolutionary process

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

Genetics of extreme body size evolution in mice from Gough Island

Conifer Translational Genomics Network Coordinated Agricultural Project

Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs

Reciprocal backcross mice confirm major loci linked to hyperoxic acute lung injury survival time

ch03 Student: If a phenotype is controlled by the genotypes at two different loci the interaction of these genes is called


Linkage & Crossing over

Chp 10 Patterns of Inheritance

Lecture WS Evolutionary Genetics Part I - Jochen B. W. Wolf 1

Supplementary material

Mendel and the Gene Idea

Mendel & Inheritance. SC.912.L.16.1 Use Mendel s laws of segregation and independent assortment to analyze patterns of inheritance.

Inheritance Biology. Unit Map. Unit

BTRY 7210: Topics in Quantitative Genomics and Genetics

Fundamentals of Genomic Selection

OPTIMIZATION OF BREEDING SCHEMES USING GENOMIC PREDICTIONS AND SIMULATIONS

Genotype AA Aa aa Total N ind We assume that the order of alleles in Aa does not play a role. The genotypic frequencies follow as

Transcription:

Experimental Design and Sample Size Requirement for QTL Mapping Zhao-Bang Zeng Bioinformatics Research Center Departments of Statistics and Genetics North Carolina State University zeng@stat.ncsu.edu 1

Experimental Designs Crosses from divergent inbred lines, populations and species Backcross cross (BC): Two genotypes at a locus (similar to RI) Simple to analyze F2: Three genotypes at a locus, can estimate both additive and dominance effects More complex for data analysis particularly for multiple QTL with epistasis More opportunity and information to examine genetic structure or architecture of QTL Have more power than BC for QTL analysis 2

Recombinant inbred lines (RI) More mapping resolution as more recombination occured in constructing RI Can improve the measurement of mean phenotype of a line with multiple individuals, i.e. can increase heritability. Potentially a very big, big advantage for QTL analysis and a big factor for power calculation and sample size requirement. 3

Advanced generation of cross: F3, F4,... By selfing: lead to RI By random mating: increase recombination, expend the length of linkage map, increase the mapping resolution (estimation of QTL position) Doubled haploid: similar to BC and RI in analysis Repeated backcross Testcross NC design III (marker genotype data on F2 or F3 and trait phenotype data on both backcrosses from F2 or F3) 4

Other populations used for QTL analysis Cross from segregating populations (no inbred available): Similar model and analysis procedure used as inbred cross, but more complex in analysis. Need to estimate the probability of allelic origin for each genomic point from observed markers. Less powerful for QTL analysis (QTL alleles may not be preferentially fixed in the parental populations); More difficult for power calculation (more unknown). 5

Half sibs: Analyze the segregation of one parent; similar to backcross in model and analysis. Less powerful for QTL detection more uncontrollable variability in the other parents. Analyze allelic effect difference in one parent, not the allelic effect difference between widely differentiated inbred lines, populations and species. Generally the relevant heritability is low for QTL analysis. 6

Full sibs: Four genotypes at a locus; can estimate allelic substitution effects for male and female parents and their interaction (dominance). Doubled information for QTL analysis than half-sibs; should be more powerful. Note: However, if we use the double pseudo-backcross approach for mapping analysis, we do NOT utilize full genetic information, (actually use less than half the information available). Not powerful for QTL identification. Power calculation depends on how the data is analyzed. Complex pedigree: go fishing 7

Power and sample size calculation First a simple case (a point for departure): One marker and One QTL for F2 Assume that the QTL genotypic effects are AA Aa aa a d a The test for marker effects and t 1 = µ MM µ mm σ 2 r n/4 + σ2 r n/4 t 2 = µ Mm µ MM = 2 µ mm 2 σr 2 = n/2 + σ2 r n + σ2 r n (1 2r)2a 8σ 2 r /n (1 2r)d 4σ 2 r /n (1) (2) 8

Note that µ Mm does not contribute to the test in (1); adding µ Mm in (1) does not increase the efficiency of the test unless d a/2 (but see below for the calculation of sample size required with dominance). 9

When n is large, the observed difference ˆt is approximately normal distributed, and the power 1 β to detect the difference (for one-tailed test) is 1 β =Prob[ˆt >z α with ˆt N(t, 1)] (3) =1 Φ(z α t) (4) where z α is the z critical value of the test with (1 α) confidence under the null hypothesis t =0andΦ(x) is the standard normal cumulative distribution function. α is the type I error and β is the type II error. 10

For given α and β for the test the sample size n required is determined by n 1 =8 n 2 =4 z α + z β (1 2r)2a/σ r z α + z β (1 2r)d/σ r 2 2 for additive effect (5) for dominance effect. (6) 11

Several points on determining the required sample size 1. If the test is two-tailed (the usual case), z α should be replaced by z α/2. 2. For interval mapping the required sample size can be reduced by a factor of (1 r )wherer is the recombination frequency between an interval of two marker loci. Example: if r is about 0.23 for a 30 cm interval. Than, (1 2r) 2 in (5) and (6) can be replaced by (1 r )=0.77 to account for the worst case when a QTL is located in the middle of an interval (r r /2). 12

3. In the test, if we also use many unlinked markers for controlling genetic background, most of genetic variance in the population can be removed from the residual variance (the idea of composite interval mapping), and σ 2 r may be roughly approximated by the environment variance σ 2 e.theoverall heritability of the trait matters enormously. 4. For a systematical search for QTL in a genome, the type I error α for each test should be substantially lower to account for increased false positive probability in an overall search. In most cases, the use of α =0.001 (a very conservative level) for each individual test should be sufficient to ensure an overall false positive rate of less than 5%. 13

These suggest that the relevant number be calculated as n 1 8 0.77 z α + z β 2a/σ e 2 for additive effect (7) Now it remains to determine the likely magnitudes of 2a/σ e. Suppose that a QTL contributes to a proportion f of the genetic variance σg 2 in a F 2 population. Assuming that no other genes are linked to the QTL and ignoring the dominance d =0(seebelow), (2a) 2 8σe 2 = fσg/σ 2 e. 2 σg/σ 2 e 2 is an unknown quantity. 14

Example: assuming h 2 F 2 = σ 2 g/(σ 2 e + σ 2 g)=0.6 means σg 2 σe 2 =1.5 and (2a) 2 σ 2 e =12f Given that α =0.001 and β =0.1 (z 0.001 + z 0.1 =3.09 + 1.28 = 4.37), the required sample sizes for detecting leading QTL for f =0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4 and 0.5 are f 0.01 0.02 0.05 0.1 0.2 0.3 0.4 0.5 n 1653 826 330 165 82 55 41 33 15

Effects of dominance Depending on the degree of the dominance effect, the sample size required for detecting dominance effect may need to be substantially increased. Dominance does not, however, affect the calculation of the power detecting QTL. For example, suppose d = a. Inthiscasewemayuse t 3 = µ M µ mm (1 2r)2a = 16σ 2 r /3n. σ 2 r 3n/4 + σ2 r n/4 But because of dominance 3(2a) 2 = fσ 2 16 g. Thus as long as f, the proportion of the genetic variation attributed to the QTL, is fixed, the required sample size for the test is unchanged. 16

Effect of linkage: multiple linked QTL Two issues Detection of QTL on the chromosome: For two linked QTL, if the model is misidentified (two QTL analyzed as one), the power to identify the one QTL is based on the joint effect of QTL (a weighted sum). If the two QTL are in coupling linkage, the joint effect is aggregated. Power is increased. If the two QTL are in repulsion linkage, the joint effect is reduced. Power is decreased, and can be very, very low. However, if we can identify the correct model (searching for two QTL or conditional searching), the issue is about separating linked QTL, and the power to identify repulsion-linked QTL is not necessarily very 17

low. Separating linked QTL (identifying both QTL) The required sample size is increased by a factor (Zeng 1993) σi 2 = 1/4 r(1 r) σ 2 i j r 0.5 0.4 0.3 0.2 0.15 0.1 1 4r(1 r) 1 1.04 1.19 1.56 1.96 2.78 r 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 3.05 3.40 3.84 4.43 5.26 6.51 8.59 12.76 25.25 1 4r(1 r) 18

Comments QTL detection and power calculation depend on QTL mapping analysis procedure: Composite interval mapping is more powerful than simple interval mapping; Multiple interval mapping is more powerful than composite interval mapping. The power of the test can be increased by combining information from multiple related traits, multiple crosses, multiple environments,... The genetic structure becomes more complex, so is the statistical analysis. But, there are definite advantages in the joint multiple trait analysis for QTL identification (Jiang and Zeng 1995), and of course for hypothesis testing (pleiotropy) and parameter estimation. 19

How large sample size do I need for my QTL mapping experiment? What is heritability for your trait (any knowledge or guess)? How large effect of a QTL (as a minimum) do you target to detect? Detect a QTL that explains 5% variation for example. Likely complexity of genetic architecture of QTL? How many QTL, distribution of effects, epistasis,... 20