GBS Usage Cases: Examples from Maize

Similar documents
Usage Cases of GBS. Jeff Glaubitz Senior Research Associate, Buckler Lab, Cornell University Panzea Project Manager

Applying Genotyping by Sequencing (GBS) to Corn Genetics and Breeding. Peter Bradbury USDA/Cornell University

Why can GBS be complicated? Tools for filtering & error correction. Edward Buckler USDA-ARS Cornell University

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Supplementary Figure 1 Genotyping by Sequencing (GBS) pipeline used in this study to genotype maize inbred lines. The 14,129 maize inbred lines were

Genetic Dissection of Quantitative Disease Resistance in Maize

Genomic resources and gene/qtl discovery in cereals

Supporting Online Material for

GBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University

Crop design with genomics and natural diversity. Edward Buckler USDA-ARS Cornell University

Inflorescence QTL, Canalization, and Selectable Cryptic Variation. Patrick J. Brown Department of Crop Sciences UIUC

Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Dissecting maize resistance to Aspergillus flavus. Santiago Mideros December 2008

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Linkage Disequilibrium

Mapping and Mapping Populations

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah

Gene Discovery of Nitrogen Utilization in Maize Yuhe Liu, Devin Nichols, Stephen Moose Department of Crop Sciences, University of Illinois

Digital genotyping of sorghum a diverse plant species with a large repeat-rich genome

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Genome-wide association study of leaf architecture in the maize nested association mapping population

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions

Genome-wide association study of leaf architecture in the maize nested association mapping population

Introduction to Quantitative Genomics / Genetics

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

Identifying Genes Underlying QTLs

WORKING GROUP ON BIOCHEMICAL AND MOLECULAR TECHNIQUES AND DNA PROFILING IN PARTICULAR. Twelfth Session Ottawa, Canada, May 11 to 13, 2010

Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize

Crash-course in genomics

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Erhard et al. (2013). Plant Cell /tpc

Analysis of genome-wide genotype data

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

latestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Association Mapping in Wheat: Issues and Trends

Intraspecific variation of residual heterozygosity and its utility for quantitative genetic studies in maize

Roadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications

Estimating the rates and modes of creation of new genetic variation in plants using NGS technologies

The Diploid Genome Sequence of an Individual Human

GDMS Templates Documentation GDMS Templates Release 1.0

Updated sesame genome assembly and fine mapping of plant height and seed coat color QTLs using a new high-density genetic map

Genome-wide association study of Fusarium ear rot disease in the U.S.A. maize inbred line collection

Genome-wide association study of Fusarium ear rot disease in the U.S.A. maize inbred line collection

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing

TACKLING NEW CHALLENGES FOR EUROPEAN SORGHUM THROUGH GENETICS AND NEW BREEDING STRATEGIES

CUMACH - A Fast GPU-based Genotype Imputation Tool. Agatha Hu

Map-Based Cloning of Qualitative Plant Genes

QTL mapping of Sclerotinia basal stalk rot (BSR) resistance in sunflower using genotyping-bysequencing

Molecular markers in plant breeding

Genomic resources. for non-model systems

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

SNP GENOTYPING WITH iplex REAGENTS AND THE MASSARRAY SYSTEM

GREG GIBSON SPENCER V. MUSE

SNP GENOTYPING WITH iplex REAGENTS AND THE MASSARRAY SYSTEM

Add 2016 GBS Poster As Slide One

Brassica carinata crop improvement & molecular tools for improving crop performance

GENETICS - CLUTCH CH.20 QUANTITATIVE GENETICS.

Why do we need statistics to study genetics and evolution?

Applying next-generation sequencing to enable marker-assisted breeding for adaptive traits in a homegrown haricot bean (Phaseolus vulgaris L.

Agricultural Applications for Genome Sequencing

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

De novo whole genome assembly

UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Funded by the Overseas Development Administration (ODA)

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Utilization of Genomic Information to Accelerate Soybean Breeding and Product Development through Marker Assisted Selection

A PCR Assay for the Anthocyaninless Mutation in Fast Plants and a Bridge Between Classical Genetics and Genomics

Application of next generation sequencing of a begomovirusresistant. KASPar assay for SNP detection of the Ty1-Ty3 region

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Schemes for Efficient Realization of Effective Recombination Events to Enhance Breeding Progress Seth C. Murray

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Genome-Wide Association Studies (GWAS): Computational Them

QTL Mapping, MAS, and Genomic Selection

WGGBS in pea, Workshop. INRA, UMR 1349 IGEPP, BP35327, Le Rheu Cedex 35653, France 2

SNP genotyping and linkage map construction of the C417 mapping population

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

Cloning drought-related QTLs. WUEMED training course June 5-10, 2006

Genomic analysis of natural variation for seed and plant size in maize Dr. Shawn Kaepler University of Wisconsin

SNPs - GWAS - eqtls. Sebastian Schmeier

Agricultural Outlook Forum Presented: February 17, 2006 STRATEGIES IN THE APPLICATION OF BIOTECH TO DROUGHT TOLERANCE

Construction of high-density linkage maps for mapping quantitative trait loci for multiple traits in field pea (Pisum sativum L.)

Speeding up discovery in plant genetics and breeding

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Association genetics in barley

The Human Genome and its upcoming Dynamics

Genomics assisted Genetic enhancement Applications and potential in tree improvement

MENDELIAN GENETICS This presentation contains copyrighted material under the educational fair use exemption to the U.S. copyright law.

Gene Mapping in Natural Plant Populations Guilt by Association

Transcription:

GBS Usage Cases: Examples from Maize Jeff Glaubitz (jcg233@cornell.edu) Senior Research Associate, Buckler Lab, Cornell University Panzea Project Manager Cornell IGD/CBSU Workshop June 17 18, 2013

Some potential applications of GBS Data Marker discovery Phylogeny/Kinship Linkage mapping of QTL in a biparental cross Fine mapping QTL Genomic selection Genome Wide Association Studies (GWAS) NAM GWAS Improving reference genome assembly

Marker Discovery GBS markers can be converted to SNPs or PCR assays of indels Develop SNP assays from polymorphic tags at same location Develop PCR primers from adjacent tags & hope for large indels ApeKI site (GCWGC) ( ) 64 base sequence tag Sample1 Loss of cut site < 450 bp Sample2

Phylogeny/Kinship Missing data not an issue for estimating pairwise genetic distance or kinship Each pair of individuals has large, random sample of markers in common Works really well even in non model organisms Fei Lu s previous talk on switchgrass Principle Coordinates Analysis better than Principle Components Analysis Uses distance matrix rather than every genotype Missing data not an issue for Prin. Coord. Analysis SNPs can be strongly affected by ascertainment bias Panel used to discover the SNPs can severely distort estimates of population genetic parameters (e.g., kinship, diversity) Industry SNPs on the Maize 55K SNP chip an extreme example

Less ascertainment bias than array based SNPs? Ram Sharma B73 (ss) (nss) Mo17

Less ascertainment bias than array based SNPs? Ram Sharma B73 (ss) (nss) Mo17

Less ascertainment bias than array based SNPs? B73 Ram Sharma (ss) (nss) Mo17

Less ascertainment bias than array based SNPs? Ram Sharma Mo17 B73 (ss) (nss)

Linkage mapping of QTL in a biparental cross In maize, we use the reference genome to order markers With ApeKI, too many markers for traditional software (MapMaker, JoinMap, R QTL etc.) Filter for a smaller set of markers with high coverage Use 6 base cutter (or combination of enzymes) for fewer markers with higher coverage JoinMap can handle at least 3,000 markers Newer software? MSTMap claims 10,000 100,000 markers Guided Evolutionary Strategy (Mester et al. 2004, Comp Biol Chem 28: 281) currently being implemented in TASSEL Others?

HMM for calling hets/correcting errors in biparental RIL populations P( S G) P( G) P( G S) P( S G) P( G) P( S) homozygous parent 1 Peter Bradbury heterozygous homozygous parent 2 Locus 1 Locus 2 Locus 3 Locus 4 distance between nodes = log[ P(obs i g i )P(g i g i 1 )]

HMM for calling hets/correcting errors in biparental RIL populations non-b73 het B73 non-b73 het B73 Low % het RIL actual calls after HMM Peter Bradbury non-b73 het B73 non-b73 het B73 Med % het RIL non-b73 het B73 non-b73 het B73 High % het RIL Physical Position on Chr1 (bp)

GBS analysis of Teosinte/W22 BC 2 S 3 868 RILs in total Framework map with 492 SNPs 51,238 bi allelic GBS markers (ApeKI)

TeoW22 BC2S3 Chr9 4122 bi allelic GBS markers W22 Teo Het

W22 Teo Het Reproducibility 60 RILs Chr9

GBS analysis of Teosinte/W22 BC 2 S 3 868 RILs in total Framework map with 492 SNPs 51,238 bi allelic GBS markers (ApeKI) (96-plex)

Maize Sh1 orthologs are located at seed shattering QTLs

Fine mapping QTL Need to saturate interval containing QTL with markers GBS a good source of markers Also need to collect recombinants in the interval Near isogenic lines (NILs) helpful (Mendelize) Good reference genome Marker1 Marker2 Map position (cm)

Fine Mapping of Domestication QTL in Maize Trait Y 7,756 GBS SNPs along Chr? 136 RC-NILS W22 Teo Het Trait Z

Genomic Selection & GWAS Complete data not required for genomic selection? Closely linked markers in LD cover for each other

Accurate genomic prediction of height based on GBS data 2800 diverse maize breeding lines USDA Ames Inbreds Ridge Regression BLUP Jason Peiffer R 2 = 0.82

Genomic Selection & GWAS Complete data not required for genomic selection? Closely linked markers in LD cover for each other Missing data are more problematic for GWAS imputation necessary, but might cause spurious results avoid false imputation of biologically missing regions area of active research

GWAS directly hits known Mendelian traits GWAS of white vs. yellow kernels in 1,595 USDA Ames inbreds Y1 Cinta Romay The best hit for kernel color lies within Y1

GWAS of a more complex trait directly hits known flowering time genes GWAS of growing degree days to silking in 2,279 inbreds ZmRap2.7 (2.6 Kb) ZmCCT (2.4 Kb) 1SNP vs. Cinta Romay 80 SNPs Even with ~660k SNPs we almost missed ZmCCT (only 1 significant SNP)

GWAS of a more complex trait directly hits known flowering time genes GWAS of growing degree days to silking in 2,279 inbreds ZmRap2.7 (2.6 Kb) ZmCCT (2.4 Kb) 1SNP vs. 80 SNPs Alex Lipka Zhiwu Zhang

Genomic Selection & GWAS Complete data not required for genomic selection? Closely linked markers in LD cover for each other Missing data are more problematic for GWAS imputation necessary, but might cause spurious results avoid false imputation of biologically missing regions area of active research In NAM GWAS, imputation is much less of an issue NAM = Nested Association Mapping population

25 DL B73 F 1 s SSD 1 NAM 2 200 B97 CML103 CML228 CML247 CML277 CML322 CML333 CML52 CML69 Hp301 Il14H Ki11 Ki3 Ky21 M162W M37W Mo18W MS71 NC350 NC358 Oh43 Oh7B P39 Tx303 Tzi8 The maize NAM population was built for NAM-GWAS

0.9 0.8 0.7 liguleless1 and liguleless2 explain the two biggest leaf angle QTL Upper leaf angle Associations with positive effect Associations with negative effect Linkage QTL peak 60 50 BPP cm/mb 0.6 0.5 0.4 0.3 0.2 0.1 0 20 10 lg1 lg3 lg2 lg4 1 2 3 4 5 6 7 8 9 10 Chromosomes Tian, Bradbury, et al 2011 Nature Genetics 1 2 3 4 5 6 7 8 9 10 Chromosomes 40 30 20 10 0 log(p)

ligule Photo: Jason Wallace

We are using GBS to pinpoint the location of cross overs in the NAM RILs B73 is the reference genome: complete knowledge Remaining NAM parents whole genome sequenced via Illumina at 4x coverage (paired end random sheared) 26 million high quality SNPs Precise knowledge of crossover locations in NAM RILs allows us to more accurately project sequences of parents onto RILs: B73 MS71 1100 SNPs: Z019E001 GBS SNPs: Z019E001

GBS data improves the resolution of joint linkage analysis in the NAM population Trait: Days to silk (flowering time) Peter Bradbury Markers Median QTL support interval 1,106 array SNPs 6.2 cm 171,479 GBS SNPs 2.6 cm Preliminary NAM GWAS results also show improved resolution

Recombination Rates for NAM from GBS Data Peter Bradbury USDA Scientist, Buckler lab, Cornell (unpublished)

The maize B73 reference genome: room for improvement? 1) The B73 reference genome accurate for B73 but less so for other maize lines (e.g., Mo17) 2) Even for B73, some regions of the genome are in the wrong place 3) Some large (multiple BAC) contigs could not be anchored assigned to chromosome 0 30 chr0 contigs in B73 RefGenV1 17 chr0 contigs in B73 RefGenV2 4) Some regions of the genome are missing 5% of B73 sequence is not in the B73 reference genome

The maize B73 reference genome: room for improvement? 1) The B73 reference genome accurate for B73 but less so for other maize lines (e.g., Mo17) 2) Even for B73, some regions of the genome are in the wrong place 3) Some large (multiple BAC) contigs could not be anchored assigned to chromosome 0 30 chr0 contigs in B73 RefGenV1 17 chr0 contigs in B73 RefGenV2 4) Some regions of the genome are missing 5% of B73 sequence is not in the B73 reference genome

Most tags can be mapped as individual alleles In a biparental cross such as maize IBM (B73 x Mo17) Provided that they are polymorphic between the parents ApeKI site (GCWGC) ( ) 64 base sequence tag B73 Loss of cut site < 450 bp Mo17

Genetically mapping individual GBS alleles in IBM Min # Successes p-value max Recomb. Total # GBS tags mapped # B73 tags mapped # Mo17 tags mapped 10 <10-3 <5% 485,860 266,192 219,668 20 <10-6 <5% 235,531 123,094 112,437 30 <10-7 <5% 140,713 73,829 66,884

Genetically mapping individual GBS alleles in IBM Min # Successes p-value max Recomb. Total # GBS tags mapped # B73 tags mapped # Mo17 tags mapped 10 <10-3 <5% 485,860 266,192 219,668 20 <10-6 <5% 235,531 123,094 112,437 30 <10-7 <5% 140,713 73,829 66,884

B73 reference genome highly accurate for B73 0.4% of B73 tags genetically map to different chromosome than they align to

B73 reference genome highly accurate for B73 0.4% of B73 tags genetically map to different chromosome than they align to but far less so for other maize lines 9.3% of Mo17 tags genetically map to different chromosome than they align to

Only 50% of the maize genome is shared between two varieties Plant 1 Person 1 50% 99% Plant 2 Plant 3 Person 2 Person 3 Maize Humans Fu & Dooner 2002, Morgante et al. 2005, Brunner et al 2005 Numerous PAVs and CNVs - Springer, Lai, Schnable in 2010

Some chunks of the B73 reference genome are in the wrong place Physical Chr Start (Mb) End (Mb) Genetic Chr Approx. Genetic Location (Mb) # Tags 10 139.3 139.8 2 16.5 16.8 49 9 102.5 106.9 9 15 32 49 7 150.1 161.8 5 192 214 13 10 0.2 0.4 4 83 151 12 8 48.4 50 2 61 127 12 10 0.07 0.2 7 47 100 9 2 231.2 231.2 7 18 26 8 3 228.1 230.5 5 194 212 6

Some potential applications of GBS Data Marker discovery Phylogeny/Kinship Linkage mapping of QTL in a biparental cross Fine mapping QTL Genomic selection Genome Wide Association Studies (GWAS) NAM GWAS Improving reference genome assembly