WGGBS in pea, Workshop. INRA, UMR 1349 IGEPP, BP35327, Le Rheu Cedex 35653, France 2

Size: px
Start display at page:

Download "WGGBS in pea, Workshop. INRA, UMR 1349 IGEPP, BP35327, Le Rheu Cedex 35653, France 2"

Transcription

1 WGGBS in pea, a successful discosnp use without genome reduction and at low sequence coverage Gilles Boutet 1,6, Alves Carvalho S. 1, Peterlongo P. 2, Falque M. 3, Lhuillier E. 4, Bouchez O. 4, Lavaud C. 1, Uricaru R. 1,3, Lesne A. 1, Pilet-Nayel M-L. 1,6, Rivière N. 5, and Baranger A. 1,6 1 INRA, UMR 1349 IGEPP, BP35327, Le Rheu Cedex 35653, France 2 INRIA Rennes - Bretagne Atlantique/IRISA, EPI GenScale, Rennes, France 3 INRA, UMR 320 GQE Le Moulon, INRA - Univ Paris-Sud - CNRS - AgroParisTech, Ferme du Moulon, Gif-sur-Yvette, France 4 GeT-PlaGe, Genotoul, INRA Auzeville F31326, Castanet-tolosan, France 5 Biogemma, route d Ennezat, CS90126, Chappes 63720, France 6 PISOM, UMT INRA/Terres Inovia, BP35327, Le Rheu Cedex 35653, France Workshop November 8, 2016

2 Pisum sativum : 4.3 Gb complex genome Introduction No pea reference genome / Limited genomic and sequencing resources Significant challenges against biotic stress breakthrough needed by breeders in Marker Assisted Selection and fine mapping Massive sequencing & SNP markers development with two complementary NGS approaches : 1. Transcriptome Sequencing of full length standardized cdna for 8 pea genotypes (Duarte et al. 2014) 2. WGGBS of gdna for pea lines and 48 RILs segregating for a major biotic stress resistance (Boutet et al., BMC Genomics 2016).02

3 WGGBS CHALLENGE Complete Genomic DNA (i.e. WITHOUT GENOME REDUCTION) of the 4.3 Gb Pea Complex Genome Quick and confident and High-throughput SNP DISCOVERY and GENOTYPING and MAPPING by SHORT READ sequencing at low to very LOW COVERAGE.03

4 No reference genome WGGBS CHALLENGE Complete Genomic DNA (i.e. WITHOUT GENOME REDUCTION) of the 4.3 Gb Pea Complex Genome Data Assembly Quick and confident and High-throughput too difficult SNP DISCOVERY and GENOTYPING memory and MAPPING and computation time by SHORT READ sequencing at low to very LOW COVERAGE.03

5 discosnp (Uricaru et al., NAR 2015) Reference-Free SNP discovery (i.e. no reference genome needed) No Data assembly needed Very Quick & very Low Memory 1 st module - Kissnp2 SNP discovery on cleaned reads (based on the graph of De Bruijn) 2 nd module - Kissreads Improving kissnp2 results by calculating for each read and each SNP: 1) coverage 2) quality of reads involved in polymorphism detection.04

6 Hiseq2000 sequencing WGGBS Strategy discosnp SNP discovery and genotyping CAR H T AGEN R scripts mapping & KASP TM validation.05

7 Hiseq2000 sequencing 1 7x / line sequences 4 pea lines 2 WGGBS Strategy discosnp SNP discovery and genotyping SNPs discovery and SNPs selection on polymorphism Baccara / PI CAR H T AGEN R scripts mapping & KASP TM validation.05

8 Hiseq2000 sequencing 1 7x / line sequences 4 pea lines 2 WGGBS Strategy discosnp SNP discovery and genotyping SNPs discovery and SNPs selection on polymorphism Baccara / PI CAR H T AGEN R scripts mapping & KASP TM validation 3 3.5x / line sequences 48 RILs 4 Sequence genotyping on 48 RILs for selected SNPs.05

9 Hiseq2000 sequencing 1 3 7x / line sequences 4 pea lines 3.5x / line sequences 48 RILs 2 4 WGGBS Strategy discosnp SNP discovery and genotyping SNPs discovery and SNPs selection on polymorphism Baccara / PI Sequence genotyping on 48 RILs for selected SNPs 5 CAR H T AGEN R scripts mapping & KASP TM validation Duarte et al. markers 176 RILs genotyping data 65 K markers mapping matrix and Custom CAR H T AGEN R scripts Genetic bin mapping.05

10 Hiseq2000 sequencing 1 3 7x / line sequences 4 pea lines 3.5x / line sequences 48 RILs 2 4 WGGBS Strategy discosnp SNP discovery and genotyping SNPs discovery and SNPs selection on polymorphism Baccara / PI Sequence genotyping on 48 RILs for selected SNPs 5 CAR H T AGEN R scripts mapping & KASP TM validation Duarte et al. markers 176 RILs genotyping data 65 K markers mapping matrix and Custom CAR H T AGEN R scripts Genetic bin mapping 3.5x / line to 40x / line available sequences 22 pea lines 4 Sequence genotyping on 22 pea lines for selected SNPs.05

11 Hiseq2000 sequencing 1 3 7x / line sequences 4 pea lines 3.5x / line sequences 48 RILs 3.5x / line to 40x / line available sequences 22 pea lines 2 4 WGGBS Strategy discosnp SNP discovery and genotyping 4 SNPs discovery and SNPs selection on polymorphism Baccara / PI Sequence genotyping on 48 RILs for selected SNPs Sequence genotyping on 22 pea lines for selected SNPs polymorphism 5 CAR H T AGEN R scripts mapping & KASP TM validation 65 K markers mapping matrix and Custom CAR H T AGEN R scripts Genetic bin mapping 6 Positional and polymorphic selection of 1000 SNPs in QTL regions of interest 7 Duarte et al. markers 176 RILs genotyping data Genetic Position KASP TM Genotyping on 1438 pea lines.05

12 WGGBS / SNP discosnp discovery Results discosnps on 4 pea parental lines 7x reads (Baccara, PI180693, Champagne, Terese) > SNPs Specific homozygous line pipeline: Minor allele coverage filter putative false heterozygote removed SNPs with missing data removed > highly confident SNPs SNP context sequence filter > highly designable SNPs Polymorphism Baccara / PI filter More than SNPs 1 CPU 4 GB RAM 48 h Boutet et al,

13 Number of SNPs WGGBS / Genotyping Results % of the 88 k SNPs discosnp Kissread module genotyping for 88 k polymorphic Baccara / PI selected SNPs on 48 RIL reads (3.5x Hiseq2000 sequencing/ril) 24 CPU 4 GB RAM 20 h SNPs present for at least 43 Rils/48 SNP mean coverage between 4x and 6x.08 Number of missing genotyping data Mean coverage on 48 RILs for the 88 k SNPs Boutet et al,

14 WGGBS / Mapping Results BP-WGGBS Duarte & al. BP-Duarte Custom CAR T H AGEN R scripts Genetic bin mapping on: 300 Duarte & al. markers (SSR...) genotyped on 176 RILs 600 Duarte & al. transcriptional SNPs genotyped on 92 RILs new WGGBS gdna SNPs genotyped on 48 RILs BP-WGGBS individual map 1027 cm markers Very good colinearity with the Duarte et al. reference consensus map LGIII Boutet et al,

15 WGGBS / Mapping Results HIGH SNPs DENSITY AND DISTRIBUTION ALONG THE PEA LINKAGE GROUP Boutet et al,

16 WGGBS / Mapping Results Boutet et al, 2016 Boutet G., Alves Carvalho S., Peterlongo P., Falque M., Lhuillier E., Bouchez O., Lavaud C., Uricaru R., Lesne A., Pilet-Nayel Boutet G., M-L., Alves Rivière Carvalho N., and S., Baranger Peterlongo A. P., Falque M., Lhuillier E., Bouchez O., Lavaud C., Uricaru R., Lesne A., Pilet-Nayel M-L., Rivière N., and Baranger A / 10 / / 10 / 13

17 M. truncatula pseudochromosome 4 syntenic region Marker densification in QTL Regions Hamon & al MetaQTL Aph 9 Markers 56.2 cm PsLGVII MetaQTL region controlling partial resistance to A.euteiches Boutet et al,

18 M. truncatula pseudochromosome 4 syntenic region Marker densification in QTL Regions BP- WGGBS map Hamon & al MetaQTL Aph 2477 Markers 23.6 cm PsLGVII MetaQTL region controlling partial resistance to A.euteiches 9 Markers 56.2 cm Boutet et al,

19 M. truncatula pseudochromosome 4 syntenic region Marker densification in QTL Regions Duarte & al. Consensus map 40 Bridge markers BP- WGGBS map Hamon & al MetaQTL Aph 2477 Markers 23.6 cm PsLGVII MetaQTL region controlling partial resistance to A.euteiches 9 Markers 56.2 cm Boutet et al,

20 WGGBS SNP polymorphism level 64 K SNPs polymorphism for 16 parental pairs discosnp (kissread) genotyping for the 64k BP-WGGBS SNPs Hiseq2000 7x reads for 4 new pea lines (Boutet et al. 2016) Hiseq2000 reads for 16 pea lines downloaded from ENA database Hiseq x reads for 2 new pea lines provided by USDA (unpublished) % of polymorphic SNPs for each 15 couples of mapping populations parents and 1 couple ok close spring field pea.12

21 WGGBS SNP polymorphism level 64 K SNPs polymorphism for 16 parental pairs For most parental pairs Between 30% and 45% of polymorphic markers polymorphism homogeneous between LGs % of polymorphic SNPs for each 15 couples of mapping populations parents and 1 couple of close spring field pea variety.12

22 WGGBS SNP polymorphism level 64 K SNPs polymorphism for 16 parental pairs Over 60% polymorphic SNPs between A fodder and a dry pea Less than 15% polymorphic SNPs between two spring field pea % of polymorphic SNPs for each 15 couples of mapping populations parents and 1 couple of close spring field pea variety.12

23 WGGBS SNP polymorphism level 64 K SNPs polymorphism for 16 parental pairs in a few cases, polymorphism rate heterogeneous between LGs BUT... % of polymorphic SNPs for each 15 couples of mapping populations parents and 1 couple ok close spring field pea.12

24 WGGBS SNP polymorphism level 64 K SNPs polymorphism for 16 parental pairs Polymorphic SNPs Monomorphic SNPs Over 50 cm monomorphic region Polymorphic SNPs Monomorphic SNPs No monomorphic region 140 cm.13

25 WGGBS SNP polymorphism Parental relationship & genome structure.14

26 WGGBS SNP polymorphism Parental relationship & genome structure.14

27 KASP TM genotyping of 1000 WGGBS SNPs selected in QTL controlling resistance to stresses Over 95% of KASP TM SNPs success in genotyping of 1.5K pea Lines including 7 complete RIL populations and 2 GWAS-dedicated panels Considering the 48 RILs genotyped both by WGGBS and by KASP TM 98.5 % of Similarity between the two sets of data.15

28 KASP TM genotyping of 1000 WGGBS SNPs selected in QTL controlling resistance to stresses 0.5% data in CONFLICTS A/B... almost all in the same pea line 1% Data in CONFLICTS A/H or B/H... almost all in heterozygous regions.15

29 CONCLUSION A powerful whole genome analysis process NGS Short Reads discosnp R scripts dedicated to non model and orphan crops... or other species (and especially suitable for homozygous inbred lines) BP-WGGBS markers map Generated an unprecedented resource in pea anchored to the Duarte et al. reference consensus map anchored to Tayeh et al. composite map and to M.truncatula genome suitable for QTL study M.A.S. breeding fine mapping.16

30 INRA UMR 1349 IGEPP (RENNES) Susete Alves Carvalho (Genouest Platform)* Raluca Uricaru (INRIA GenScale team)* *Sofiproteol funding on PEAPOL project Angelique Lesné Clement Lavaud Marie-laure Pilet-Nayel Alain Baranger INRA UMR 320 GQE (Gif sur Yvette) Matthieu Falque (R-CarthaGene) INRA, plateforme GeT-PlaGe (Toulouse) Olivier Bouchez Emeline Lhuillier INRIA/IRISA (RENNES) Olivier Collin (Genouest platform) Pierre Peterlongo (Genscale team) Biogemma (Clermont-Ferrand) Nathalie Rivière Thank You financial support of SOFIPROTEOL under the Project PEAPOL

31 INRA UMR 1349 IGEPP (RENNES) Thank Susete Alves Carvalho (Genouest Platform)* Raluca Uricaru (INRIA GenScale team)* *Sofiproteol funding on PEAPOL project Angelique Lesné Clement Lavaud Marie-laure Pilet-Nayel Alain Baranger INRA UMR 320 GQE (Gif sur Yvette) Matthieu Falque (R-CarthaGene) INRA, plateforme GeT-PlaGe (Toulouse) Olivier Bouchez You Emeline Lhuillier INRIA/IRISA (RENNES) Olivier Collin (Genouest platform) Pierre Peterlongo (Genscale team) Biogemma (Clermont-Ferrand) Nathalie Rivière financial support of SOFIPROTEOL under the Project PEAPOL