Genomic prediction of complex phenotypes: Driving innovation in the Brazilian forest based industry. Dario Grattapaglia

Size: px
Start display at page:

Download "Genomic prediction of complex phenotypes: Driving innovation in the Brazilian forest based industry. Dario Grattapaglia"

Transcription

1 Genomic prediction of complex phenotypes: Driving innovation in the Brazilian forest based industry Dario Grattapaglia

2 Ron Sederoff s legacy at the IUFRO Tree Biotechnology - Brazil 2011 Actually just a small part... Ron Sederoff, the father of tree biotechnology

3 The global area of planted forests is still very small compared to the wood demand 8% of world s forest area 2% of land use 270 million hectares but is has grown in the last 20 years FAO Global Forest Resource Assessment 2015

4 Eucalyptus: a global tree

5 Land use in 529 mi ha Brazil 851 mi ha Brazil Less than 1% Preserved areas and other uses 315 mi ha 72 mi ha 7 mi ha Planted Forests Arable Land All Crops Source: IBGE(2011) 5

6 Eucalypts fiber farms in tropical sites Current realized average mean annual increment (MAI) in industrially managed Eucalyptus forests in Brazil 45 m 3 /ha/year Loblolly pine in SE USA 15 m 3 /ha/year Productivity at rotation age 6 years 270 m 3 /ha

7 Evolution of eucalypt planted forest productivity in Brazil Dec. m³ / ha / yr Evolution of forest productivity Productivity tripled in 50 years Technologies Tree breeding Clonal propagation Soil management Mechanized harvest Nutrition Minimal cultivation (No till farming) Integrated Pest management Combined biological and chemical Shared knowledge through networks Companies/Universities/Embrapa Land area needed to produce 1 million tons of cellulose 1970 : 170,000 ha 2000 : 100,000 ha 7

8 Challenges for planted forests more wood on less land Projected 9.5 billion people 10 billion m³ of wood needed Consumption: 3X current 50% has to come from planted forests million hectares of planted forests Who? How? Where? Increased income and consumption in developing countries Increased demand for products and services from forests Source: WBCSD, WWF, FAO.

9 Forest tree breeding Trees are largely undomesticated, lots of genetic variation Long breeding cycles, poor juvenile mature correlations Logistically complex, large areas, multiple sites Late expressing traits and delayed flowering Extended time lag between the breeding investment and the deployment of genetically improved material Costly operation, more susceptible to changes in market demands, business objectives and climate change

10 The breeder's equation Genetic gain = i * r * A L i= selection intensity r = selection accuracy (correlation between estimated breeding value and true BV) A = additive genetic standard deviation (additive genetic variation available) L = breeding cycle length

11 Newly selected elite clone age 2 Currently planted elite clone age 2 Advanced breeding and selection has a great impact on forest productivity Photograph: Fibria

12 Steps taken for the selection of new elite clones Selection parameters: trunk and crown form, % bark, productivity Adt/ha; wood quality; disease resistance, financial margin Hybrid mating Selection of best families and best trees in hybrid progeny trials (growth, form, pilodyn density and NIRS) Best trees are felled Production of cutting for first clonal trial Selection of top clones in first clonal trial (growth, form, density and NIRS) Commercial forests Production of clonal plants for planting Production of cutting for minicutting expansion Selection of top clones in expanded clonal trial (growth, density, disease resistance, woodquality) Production of cutting for expanded clonal trial Even in fast growing Eucalyptus this process takes between 12 and 16 years Early selection methods for late expressing traits and hard to measure traits would be very useful especially wood quality and disease resistance

13 The longest-standing question in genetics: How does genetic variation contribute to phenotypic variation? Mendelians: focused on discrete, monogenic phenotypes Biometricians: did not believe that Mendelian genetics can explain complex traits Debate was resolved in a 1918 paper by R.A. Fisher: the infinitesimal model Many genes affect a trait, producing a continuous, normally distributed phenotype in the population Molecular biologists believe on the widespread existence of single genes of large effects controlling complex phenotypes Quantitative geneticists devised statistical methods to treat complex traits by partitioning variances GENOMICS now allows convergence!!

14 THE CONVERGENCE OF GENOMICS AND QUANTITATIVE GENETICS In advanced tree breeding we are moving from trying to discover genes and determine their individual effects, to dealing with the full aggregate effect of the entire genome

15 Genomics and prediction

16 Genomic Selection: put in a simple concept Select on thousands of DNA markers across the entire genome so that ALL gene effects are captured in a predictive model GENES DNA markers

17 Cross validation DEVELOPMENT OF PREDICTIVE MODEL Validation population Predictive model updating Training population (e.g. progeny trial N ~ 2,000 of a breeding population with N e ~ 60) GENOTYPES PHENOTYPES SNP data Trait data Predictive model Y = Xb + Zh + e Field trial and phenotype BREEDING GENOMIC SELECTION CYCLE Flower induction of top ranked GEBV seedlings Elite clones GENOTYPES SNP data Selected seedlings (top 5% ranked by GS) SELECTION CANDIDATES (Young seedlings genotyped but not phenotyped) (e.g. 100 full or half sib families of 100 offspring each = 10,000 seedlings) Clonal trial of top ranked GEGV seedlings

18 Genomic Selection: an operational technology in animal breeding Conventional progeny test based breeding Genomic Selection based program Genomic Bulls Van Eenennaam 2014 Ann. Rev. Animal Biosciences

19 We started Genomic Selection in fores trees in 2007 Eucalyptus first experimental results in 2009 Cenibra population N e = 11 Fibria population N e = 51 Trait Diameter Height Wood Pulp Diameter Height Wood Pulp Density Yield Density Yield Heritability from pedigree Number of individuals Predictive ability Accuracy of Genomic BLUP Accuracy of phenotypic BLUP Genomic prediction matched phenotypic prediction for all traits Predictive ability across populations was very low (< 0.2) Variable genetic background and G x E confounded GS models should be population and environment specific Resende et al New Phytologist

20 Eucalyptus genome Myburg, Grattapaglia, Tuskan et al Genome size: 640 Mbp 605 Mbp (94%) in 11 chromosomes 36,376 predicted protein coding genes

21 Genomic Selection requires a highly efficient DNA marker platform Development of a DNA CHIP for Eucalyptus For long term implementation of GS in Eucalyptus we developed a DNA marker platform with: Genome wide DNA marker density High reproducibility and portability of data Informative for the BIG TEN Eucalyptus species Speed of data delivery Public access, worldwide use Low cost per sample Crowd funding : eucalypt forest companies worldwide We sequenced the genome of 240 eucalyptu trees form 12 different species planted worldwide

22 The EuCHIP provides high quality DNA data for 60 thousand markers in the genome Homozygous AA Homozygous GG Missing data Heterozygous AG Automated genotyping with stringent genotype declaration parameters Minimal human intervention in data editing (removal of bad samples) Reproducibility above 99.99% within and between experiments User friendly data files; immediately usable into common softwares

23 PREDICTED OBSERVED GENOMIC PREDICTIONS OF 15 GROWTH AND WOOD TRAITS Good correlations between predicted and observed data as good as or better that direct phenotypic measurements following independent cross validation

24 Mean annual volume growth Basic wood density Genomic Selection successfully identifies the top trees Average genomic value of the top twenty genomically selected trees Cellulose Pulp Yield Probability of rejecting the null hypothesis ( =1%) that Genomic Selection would select randomly Resende, R.T. et al Heredity

25 Results in other forest tree species followed Good genomic prediction abilities across species and traits Loblolly pine (Pinus taeda) Public dataset and no differences among models (Resende et al. 2012) Prediction driven by relatedness (Zapata Valenzuela et al. 2012) GRM better to separate additive and non additive effects (Munoz et al. 2014) White spruce (Picea glauca) Prediction strongly dependent on relatedness (Beaulieu et al. 2014a) Prediction accuracy across environments varies with trait (Beaulieu et al. 2014b) Interior spruce (PIcea glauca x engelmannii) P.A.s were good within but unreliable across environments (El Dien et al. 2015) Major G x Age effect on P.A.s; no difference across models (Ratcliffe et al. 2015) Maritime pine (Pinus pinaster) Training on parents and progeny; no difference across models (Isik et al. 2016) Good predictions across generations (Bartholomé et al. 2016) AND MORE RESULTS ARE COMING...

26 Genomic Selection tree breeding Time gain: significantly accelerate breeding cycles Improved precision for hard to select or late expressing traits (ex. wood quality, stem form) Selection for ALL traits simultaneously in ALL plants Higher selection intensity

27 A back of the envelope financial analysis CONVENTIONAL Eucalyptus BREEDING 18 YEARS GENOMIC SELECTION BREEDING TIME SAVINGS: 9 YEARS How much does GS cost? ~500k US$/generation How much is it worth having wood that provides 1% higher pulp yield nine years ahead of time in a 1 Million ton pulp mill? 10K ton x 800 US$ x9 years = 72 M US$

28 GENOMIC SELECTION IN FOREST TREES: WHERE ARE WE NOW? GARTNER HYPE CYCLE OF NEW TECHNOLOGIES

29 Genomic selection in forest trees current research Multi trait selection: GS index based on economic value Inbreeding and reduction of diversity Better management of inbreeding by specifying the Mendelian term Greater impact of reduction of diversity surrounding QTLs due to hitch hiking Weighed GS to reduce loss of rare alleles "Moving target environment PAST PERFORMANCE IS NO GUARANTEE OF FUTURE RETURNS" Training in expected future environments (climate change) Predictive model updating Counterbalance decay of relationship and LD Change in trait architecture and environment; continuous validation Logistics: flower induction, people, detailed case by case cost/benefit analysis

30 It s difficult to make predictions, especially about the future Niels Bohr Essentially, all models are wrong, but some are useful George E. P. Box There are an awful lot of ways for predictions to go wrong thanks to bad incentives and bad methods Nate Silver 2012

31 Genomic Selection in Eucalyptus Companies already investing in this new breeding technology in Brazil and the world using the EuCHIP60K

32 DArT projects Acknowledgments Bioinfo EuCHIP60K Phenotyping Carolina Sansaloni Cesar Petroli Danielle Faria GS Prediction and GWAS Orzenil Bonfim Alexandre Missiaggia Elizabete Takahashi Shawn Mansfield UBC Funding Marcos Resende Marcio Resende Eduardo Cappa Collaborations Bruno Lima Biyue Tan Barbara Muller Matias Kirst Patricio Muñoz Leandro Neves Andrzej Kilian Daniel Pomp Harry Wu Par Ingvarsson

33 Thanks!