Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato

Similar documents
Transcription:

Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato Sanjeev K Sharma Cell and Molecular Sciences The 3 rd Plant Genomics Congress, London 12 th May 2015

Potato genome sequencing DMDD map UHD map (Van Os, 2006) PGSC, 2011 Nature Sharma et al., 2013 G3: Genes, Genomes, Genetics

SolCAP 8k Infinium SNP array Stenotomum Cultivated 4x

Ascertainment bias Genotyping and Genome type Fixed arrays Flexible platforms A B C D E

Genotyping-by-sequencing (GBS) Sample 1 SNP and small INDELs Loss of cut site Sample 2 GBS sequence tag Restriction site Exploits complexity reduction by Restriction digestion Repetitive regions of genomes can be avoided

Pros and Cons Simultaneous marker discovery and genotyping Complexity reduction is easy, quick, specific and highly reproducible Simple library preparation, highly multiplexed and less expensive Reduced sample handling, few PCR and purification steps No DNA size fractionation Variation in sample representation less than other protocols Large amount of missing data (43% vs 89%, Elshire et al., 2011 PLoS-One)

Establishment of GBS in potato Sequencing-ready GBS library Elshire et al. 2011, PLoS-One; Poland et al. (2012), PLoS-One

Restriction Enzyme combinations PstI (CTGCA*G) NsiI (ATGCA*T) SbfI (CCTGCA*GG) BfaI (T*TAA) PstI NsiI SbfI MseI (C*TAG)

Different levels of multiplexing 12plex 24plex 48plex 96plex

Consistency in read counts across different multiplexing levels 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 12plex 24plex 48plex 96plex

De-multiplexing and reads filtering Multiplexing @24 clones per Illumina HiSeq lane, 150 bp reads Average categorised reads/library: ~103 million Average categorised reads/sample: ~4.3 million

GBS bioinformatics analysis workflow Raw Reads FastQC Initial quality control of sequences Barcodes De-multiplexing of reads Removal of index and addition of cultivar code to reads Sequence variant calling & genotyping Processing of read depth and coverage Alignment of reads to reference genome sequence VCFtools Filtering genotype calls

Genome-wide coverage of GBS tags All 12 Chromosomes Higher coverage in gene-rich regions Lower coverage in centromeric regions Read depth Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 Chr10 Chr11 Chr12 GBS tags genome-wide coverage Genome-wide gene density Genome-wide repeat region density

Aligned GBS tags

Genotyping and SNP calling 270,358 (Total) 219,059 (Association Panel only) 136,903 (Missing data cutoff, 20%) 117,542 (Multiallelic and other filters) 44,700 (Minor Allele Frequency filter) SNP frequency: ~1/46bp

SNP Distribution

Kinship analysis and Population structure GBS SolCAP

Association Panel subgroups Old/European USA Breeding lines European/UK UK

Genomic distribution of FST

Polymorphism Information Content (PIC)

(Adapted from Soto-Cerda and Cloutier, 2012; InTech) Linkage Mapping vs Association Mapping Linkage Mapping QTL segregation in offspring Marker-trait linkage within-family Poor (0.1 to 15 cm) Subset (only the portion segregating in sampled pedigrees) Association Mapping QTL segregation in population Marker-trait LD in collected population Can be excellent (10 s to 1000 s kb) Larger subset (Theoretically all variation segregating in targeted regions)

Genome-wide association analysis (GWAS) ~300 tetraploid genotypes Genotyping by 8K Infinium array and GBS Field trials for evaluation of >20 traits at two sites for two years Cygnet Plant Breeders Ltd

Phenotypic trait correlations Inter-replicate reliability Cambridge N+ York N+ Cambridge N- York N- Cygnet Plant Breeders Ltd

Different model tests for GWAS Flesh colour locus on chromosome III Full Model (Q&K) Structure (Q) Only Kinship (K) Only Naïve Model (No Q&K)

Model Fitness criteria

Quantile-Quantile Plot Full Model (Q&K) Structure (Q) Only Kinship (K) Only Naïve Model (No Q&K)

GWAS: Flesh Colour SolCAP SNPs GBS SNPs

GWAS: Tuber Shape SolCAP SNPs GBS SNPs

GWAS: After Cooking Blackening

GWAS: Stolon Attachment

Development of Diagnostic Markers KASP TM marker assay Allelic Discrimination Plots *Kompetitive Allele Specific PCR genotyping system

Conclusions GBS validated in 4x material, ~45k robust SNPs for basic and applied research Combines marker discovery and genotyping into a single approach GBS data is dynamic and not affected by ascertainment bias GBS SNPs are confined to gene-rich regions Weak population structure in tetraploid material Good associations with simple and more complex traits Potential for developing diagnostic markers

Acknowledgements CMS, JHI Glenn Bryan Finlay Dale Karen McLean Malcolm Macaulay ICS, JHI David Marshall Micha Bayer Industrial Partners Cygnet Plant Breeders Ltd McCain Foods (GB) Ltd Mylnefield Research Services Ltd PGSC BioSS Katrin MacKenzie

Traits are highly heritable Codominant additive mode of inheritance (Slide from Leopold Parts, EMBL; Photo by Rene Maltete)