Statistical Analysis of Genetic and Phenotypic Data for Breeders

Size: px
Start display at page:

Download "Statistical Analysis of Genetic and Phenotypic Data for Breeders"

Transcription

1 Statistical Analysis of Genetic and Phenotypic Data for Breeders Angela Pacheco, R.A.PACHECO a cgiar.org Francisco M. Rodrı guez, F.R.Huerta a cgiar.org Gregorio Alvarado, g.alvarado a cgiar.org Head of BSU:Juan Burguen o, J.Burgueno a cgiar.org Summer, /1

2 What is META-R Descriptive Statistics Compute BLUEs and BLUPs Compute Genetic Correlations among Locations Compute Genetic Correlations between Traits 2/1

3 3/1

4 4/1

5 5/1

6 META-R means Multi-Environment Trial Analysis using R, a software which analyzes a set of genotypes previously evaluated in a single or multiple environments (Locations, years), including different management conditions such as: water stress or low nitrogen 6/1

7 /pause 7/1

8 Why apply METs. Because the statistical analysis can detect and explain whether there are mechanisms of repeteability or interaction between genotypes and environments objective: Detecting genotypes best performance in a single and across environments Identify and generate Mega-Environments Identify the association between traits (indirect response to selection) 8/1

9 What is the core component of METs: Experimental Design why?. Because we can control better the variability plot to plot Most crop breeding programs uses two mainly experimental designs Random Complete Block Design (parameter estimation by mean of Ordinary Least Square -OLS or GLS-) Incomplete Block Designs: Lattice or alpha-lattice Designs (parameter estimation using Restricted Maximum Likelihood -REML-) 9/1

10 Selection of design to use, depends of Number of genotypes to be evaluated Field conditions Soil homoegeneity Weather conditions Example: Small number of genotypes will be evaluated under homogenous and optimal field conditions a RCBD is a good choice However if the number of genotypes is higer and field conditions, including biotic and abiotic stress factors such as nutrient deficcient and/or the water availability is small, then small sub-blocks should be used to reduce the within-environment variation 10/1

11 11/1

12 Descriptive Statistics Statistics starts with a problem, continues with the collection data, proceeds with the data analysis and finishes with conclusions It is a common mistake to plunge into a complex analysis without paying attention to what the objectives are or even whether the data are appropriate for the proposed analysis The formulation of a problem is often more essential than its solution which may be merely a matter of mathematical or experimental skill. Albert Einstein 12/1

13 Descriptive Statistics Statistics starts with a problem, continues with the collection data, proceeds with the data analysis and finishes with conclusions It is a common mistake to plunge into a complex analysis without paying attention to what the objectives are or even whether the data are appropriate for the proposed analysis The formulation of a problem is often more essential than its solution which may be merely a matter of mathematical or experimental skill. Albert Einstein 13/1

14 It is important to understand how the data was collected Are the data observational or experimental Are there missing values How are the data coded What are the units of measurement beware of data entry errors The last problem is all too common, almost a certainty in any real dataset of at least moderate size. Perform some data sanity checks 14/1

15 Initial data analysis This is a critical steep that should always be performed. It looks simple but is vital Numerical Summaries Means Standard Deviations Five number summaries Maximum, Minimum, etc Graphical summaries One variable - BoxPlot, Histogram, etc. two variables - scatterplots 15/1

16 When doing an analysis: What can go wrong? Many things, unfortunately Source and quality of the data directly affects what conclusion we can draw Look for outliers, data-entry errors and skewed or unusual distributions Are the data distributed as you expect? Getting data into a form suitable for analysis by cleaning out mistakes and aberrations is often time consuming. it means that we can take more time that the data analysis itself 16/1

17 17/1

18 Compute BLUEs and BLUPs Procedures for obtain estimators and predictors for genotypes had been developed (OLS or GLS and Likelihood theory when we used mixed models), the most common are Best Linear and Unbiased Estimators (BLUEs) and Best and Unbiased Predictors (BLUPs) they are best in sense that Minimize the sample variance Are linear because they corresponding with linear functions of observed phenotypes Are unbiased because their expected values are the same that true parameter value 18/1

19 19/1

20 20/1

21 21/1

22 22/1

23 23/1

24 24/1

25 25/1

26 26/1

27 27/1

28 28/1

29 Linear Models wich META-R uses 29/1

30 30/1

31 31/1

32 Covariance Analysis Sometimes there is a suspicious that some trait (MRV) is affected by other traits or set of traits and it can improve the estimation of genotype performance and reduce the experiment s random error For example when crops are damaged by unusual meteorological events or external agents as birds, rodents, etc. This implies that some plants are missing from the experiment, a good caovariable should to include number rof plants as covariate In maize: Under streess conditions; it is very common to adjust Yield by Anthesis date 32/1

33 33/1

34 34/1

35 35/1

36 Genetic Correlations META-R calculates matrices of phenotypic and genetic correlations among locations and between traits 36/1

37 37/1

38 38/1

39 39/1

40 40/1

41 41/1

42 42/1

43 43/1

44 44/1

45 45/1