Molecular marker redundancy check and construction of a high density genetic map of tetraploid cotton. Jean-Marc Lacape

Similar documents
Progress and Perspectives of a Cotton Microsatellite Database as a Comprehensive Public Marker Depository for Gossypium

A High-Density Simple Sequence Repeat and Single Nucleotide Polymorphism Genetic Map of the Tetraploid Cotton Genome

A genetic map between Gossypium hirsutum and the Brazilian endemic G. mustelinum and its application to QTL mapping

GDR, the Genome Database for Rosaceae: New Data and Functionality

Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Polymorphism analysis of multi-parent advanced generation inter-cross (MAGIC) populations of upland cotton developed in China

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

Genome annotation & EST

Review Article Marker-Assisted Breedingas Next-Generation Strategy for Genetic Improvement of Productivity and Quality: Can It Be Realized in Cotton?

Usage Cases of GBS. Jeff Glaubitz Senior Research Associate, Buckler Lab, Cornell University Panzea Project Manager

Screening of highly informative and representative microsatellite markers for genotyping of major cultivated cotton varieties

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Molecular Characterization of Heterotic Groups of Cotton through SSR Markers

Prioritization: from vcf to finding the causative gene

TACKLING NEW CHALLENGES FOR EUROPEAN SORGHUM THROUGH GENETICS AND NEW BREEDING STRATEGIES

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

MI615 Syllabus Illustrated Topics in Advanced Molecular Genetics Provisional Schedule Spring 2010: MN402 TR 9:30-10:50

Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!

QTL mapping with different genetic systems for nine nonessential amino acids of cottonseeds

Intraspecific linkage map construction and QTL mapping of yield and fiber quality of Gossypium babardense

Get to Know Your DNA. Every Single Fragment.

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

The International Cotton Genome Initiative (ICGI): Its origin, mission, and development. Russell J. Kohel ICGI Chair USDA, ARS College Station, Texas

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

GDMS Templates Documentation GDMS Templates Release 1.0

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

ABSTRACT : 162 IQUIRA E & BELZILE F*

Identifying Genes Underlying QTLs

Data Retrieval from GenBank

Genomic resources. for non-model systems

Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group

Why can GBS be complicated? Tools for filtering, error correction and imputation.

MICROSATELLITE MARKER AND ITS UTILITY

Chapter 5. Structural Genomics

GBS Usage Cases: Examples from Maize

Identifying the functional bases of trait variation in Brassica napus using Associative Transcriptomics

GREG GIBSON SPENCER V. MUSE

Supplemental Figure Legends

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Brassica carinata crop improvement & molecular tools for improving crop performance

NGS developments in tomato genome sequencing

PCB Fa Falll l2012

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

GBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University

Gap Filling for a Human MHC Haplotype Sequence

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

Jinfa Zhang 1*, Jiwen Yu 2*, Wenfeng Pei 2, Xingli Li 2, Joseph Said 1, Mingzhou Song 3 and Soum Sanogo 4

Introduction to some aspects of molecular genetics

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

A brief introduction to Marker-Assisted Breeding. a BASF Plant Science Company

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

Chapter 2: Access to Information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries.

Digital genotyping of sorghum a diverse plant species with a large repeat-rich genome

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

The Diploid Genome Sequence of an Individual Human

Research Article A High-Density SNP and SSR Consensus Map Reveals Segregation Distortion Regions in Wheat

Xianliang Song, Kai Wang, Wangzhen Guo, Jun Zhang, and Tianzhen Zhang. Mots clés : cotonnier, microsatellites, cartographie comparée, semigamie.

Edinburgh Research Explorer

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

Statistical Methods for Network Analysis of Biological Data

Nature Genetics: doi: /ng Supplementary Figure 1. H3K27ac HiChIP enriches enhancer promoter-associated chromatin contacts.

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah

A general approach to singlenucleotide. discovery. G Matrh et al. 1999

Allocation of strains to haplotypes

A tutorial introduction into the MIPS PlantsDB barley&wheat databases. Manuel Spannagl&Kai Bader transplant user training Poznan June 2013

Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline

Molecular Dissection of Seedling Salinity Tolerance in Rice (Oryza sativa L.) Using a High-Density GBS-Based SNP Linkage Map

Fine mapping of major rust resistance gene in cultivar R570 with a view towards it closing through map-bases chromosome walking

BME 110 Midterm Examination

HIGH-QUALITY ASSEMBLY OF THE DURUM WHEAT GENOME CV. SVEVO

WGGBS in pea, Workshop. INRA, UMR 1349 IGEPP, BP35327, Le Rheu Cedex 35653, France 2

Supplementary Figure 1 Genotyping by Sequencing (GBS) pipeline used in this study to genotype maize inbred lines. The 14,129 maize inbred lines were

Genome Sequencing-- Strategies

Intraspecific variation of residual heterozygosity and its utility for quantitative genetic studies in maize

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Bioinformatics Course AA 2017/2018 Tutorial 2

Next-generation sequencing technologies

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Applications and Uses. (adapted from Roche RealTime PCR Application Manual)

Bioinformatics (Lec 19) Picture Copyright: the National Museum of Health

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato

Molecular Diversity and Association Analysis of Drought and Salt Tolerance in G. Hirsutum L. Germplasm

Two Mark question and Answers

Development and identification of Verticillium wilt-resistant upland cotton accessions by pyramiding QTL related to resistance

Applied Bioinformatics

Authors: Vivek Sharma and Ram Kunwar

INTERNATIONAL UNION FOR THE PROTECTION OF NEW VARIETIES OF PLANTS

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Genome Assembly of the Obligate Crassulacean Acid Metabolism (CAM) Species Kalanchoë laxiflora

Mathematics of Forensic DNA Identification

Molecular markers in plant breeding

YANG ZHANG, MEIPING ZHANG, Yun-Hua Liu, C. Wayne Smith, Steve Hague, David M. Stelly, Hong-Bin Zhang*

Mutations during meiosis and germ line division lead to genetic variation between individuals

BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES

Transcription:

Molecular marker redundancy check and construction of a high density genetic map of tetraploid cotton Jean-Marc Lacape Raleigh ICGI, October 10, 2012

Background ICGI/2008 Anyang Numerous marker projects (several 10s thousands) SSRs: BNL, CIR, etc.. SNPs, etc Several genetic/qtl maps (several 10s) Interspecific (GhxGb) Intraspecific (GhxGh) Others.. Genome sequencing D-genome (A- genome) (AD-genome) Structural Genetics WG priorities 1- Marker nomenclature, synonymy and redundancy 2- Map convergence, and building of a consensus map 3- Convergence geneticphysical

Background Objectives Numerous marker projects (several 10s thousands) SSRs: BNL, CIR, etc.. SNPs, etc Several genetic/qtl maps (several 10s) Interspecific (GhxGb) Intraspecific (GhxGh) Others.. Genome sequencing D-genome (A- genome) (AD-genome) 1- Curate redundancy among SSR markers 2- Build a High Density Consensus (HDC) map from 6 component maps 3- Conduct geneticphysical map alignments

Library No seq. In library BNL 379 CIR 392 CM 53 DOW 52 DPL 200 Gh 700 HAU 3383 JESPR 309 MGHES 84 MON 2568 MUCS 617 MUSB 1316 MUSS 554 NAU 3249 NBRI 2233 PGML 308 STV 192 TMB 754 total 17343 1- Marker redundancy 17343 markers from CMD (18 different libraries) as of April 2012 pair-wise sequence alignment (Smith- Waterman algorithm) similarity threshold 90%

1- Marker redundancy at 90% sequence similarity threshold, 5741 markers, or 20%, have a match with at least one other marker of the Lib BNL CIR CM DOW DPL Gh HAU JESPR MGHES MON MUCS MUSB MUSS NAU NBRI PGML STV TMB database (either from the same or from a different library) BNL 9 10 1 4 8 4 4 16 2 4 1 3 CIR 6 7 2 3 8 2 1 1 40 CM 5 1 3 28 3 1 DOW 1 1 3 1 DPL 1 2 5 2 5 3 3 Gh 103 5 29 1 27 1 1 4 10 1 HAU 646 3 49 244 145 1 206 872 136 55 46 3 JESPR 14 4 11 4 2 1 MGHES 3 40 6 10 52 3 2 MON 183 14 2 22 182 56 6 8 6 MUCS 54 162 114 28 22 MUSB 168 12 1 MUSS 5 224 26 19 NAU 593 123 50 115 NBRI 146 5 4 1 PGML 16 2 3 STV 12 TMB 35

1- Marker redundancy Level of redundancy within libraries Lib BNL CIR CM DOW DPL Gh HAU JESPR MGHES MON MUCS MUSB MUSS NAU NBRI PGML STV TMB BNL 9 10 1 4 8 4 4 16 2 4 1 3 CIR 6 7 2 3 8 2 1 1 40 CM 5 1 3 28 3 1 DOW 1 1 3 1 DPL 1 2 5 2 5 3 3 Gh 103 5 29 1 27 1 1 4 10 1 HAU 646 3 49 244 145 1 206 872 136 55 46 3 JESPR 14 4 11 4 2 1 MGHES 3 40 6 10 52 3 2 MON 183 14 2 22 182 56 6 8 6 Within library, redundancy was highest (>10%) in Gh, MUSB, NAU and HAU MUCS 54 162 114 28 22 MUSB 168 12 1 MUSS 5 224 26 19 NAU 593 123 50 115 NBRI 146 5 4 1 PGML 16 2 3 STV 12 TMB 35

1- Marker redundancy Level of redundancy between libraries Lib BNL CIR CM DOW DPL Gh HAU JESPR MGHES MON MUCS MUSB MUSS NAU NBRI PGML STV TMB BNL 9 10 1 4 8 4 4 16 2 4 1 3 CIR 6 7 2 3 8 2 1 1 40 CM 5 1 3 28 3 1 DOW 1 1 3 1 DPL 1 2 5 2 5 3 3 Gh 103 5 29 1 27 1 1 4 10 1 HAU 646 3 49 244 145 1 206 872 136 55 46 3 JESPR 14 4 11 4 2 1 MGHES 3 40 6 10 52 3 2 MON 183 14 2 22 182 56 6 8 6 MUCS 54 162 114 28 22 MUSB 168 12 1 Between libraries, matches amongst most libraries MUSS 5 224 26 19 NAU 593 123 50 115 NBRI 146 5 4 1 PGML 16 2 3 STV 12 TMB 35

1- Marker redundancy Library No seq. In library Total cross library (pair-wise) BNL 379 57 CIR 392 74 CM 53 37 DOW 52 5 DPL 200 26 Gh 700 102 HAU 3383 1776 JESPR 309 90 MGHES 84 167 MON 2568 647 MUCS 617 494 MUSB 1316 17 MUSS 554 669 NAU 3249 1763 NBRI 2233 400 PGML 308 165 STV 192 175 TMB 754 62 total 17343 6726 our sequence-based redundancy check aimed at recovering identical-by-sequence SSR markers (cross-pcr-amplification), thus recovering as «redundants»: - homoeologs, paralogs and other copies - sequences with mutiple SSRs - alleles etc It is necessary to account-for marker/locus redundancy for (1) across- map alignments and (2) consensus map construction

2- A High Density Consensus map Six selected interspecific Gh x Gb genetic maps (only map positions, no raw codings) Map code Parents Generation References T3 TM-1 x 3-79 RIL Yu et al (2012) GV Guazuncho 2 x VH8 BC 1 -RIL Lacape et al (2009) PK Palmeri x K101 F 2 Rong et al (2004) TH TM-1 x Hai 7124 BC 1 Guo et al (2008) E3 Emian22 x 3-79 BC 1 Yu et al (2011) CH CRI 36 x Hai 7124 F 2 Yu et al (2007) standardize marker names (BNL119=BNL0119) verify orientation acronyms T3, GV, PK, TH, E3 and CH

2- A High Density Consensus map Six maps were saturated with combination of different markers types (SSRs as predominant) Map code Generation Pop size cm Total T3 RIL 186 2072 3380 GV BC1-RIL 215 1745 3637 PK F2 57 2584 4448 TH BC1 138 2247 3541 E3 BC1 141 2316 4419 CH F2 186 1080 4418 Total 12044 No. of mapped markers RFLP AFLP SSR SNP Others 1825 247 190 715 781 59 2459 124 1 71 1865 10 301 2311 5 93 690 297 2649 879 7596 257 663

2- A High Density Consensus map Six maps with good level of connections: as many as 10086 bridges (same chromosome) altogether No bridges (same chromosome) Before redundancy curation Map T3 GV PK TH E3 CH Total Betw. T3 105 422 63 408 837 351 2081 GV 37 221 350 323 329 1645 PK 76 69 64 79 496 TH 66 929 283 2039 E3 193 315 2468 CH 31 1357 Probably overestimated because of redundancy

2- A High Density Consensus map Following redundancy check, groups of suspected redundant markers (2357 clusters with up to 14 SSR) were assigned a collective name CLU (cluster) HAU1430 NAU3074 STV069 3 markers renamed as CLU1057 JESPR240 HAU2108 MGHES10 NAU3533 NAU5239 5 markers as renamed CLU1355 Additional evidence of redundancy from map data

CLU1057 HAU1430 STV069 NAU3074 // // CLU1355 JESPR240 NAU5239 MGHES10 NAU3533 HAU2108 // Map data as additional evidence of redundancy CLU1355 c1 c15 JESPR240 T3, TH T3, TH NAU3533 E3, TH E3 MGHES10 CLU1057 c11 c21 HAU1430 E3 STV069 E3, T3 T3 NAU3074 E3, TH E3, TH T3, CH, GV T3, CH, GV

2- A High Density Consensus map Redundancy check (replacing marker names by a collective CLU name) resulted in: additionnal new correspondances: 2 redundant markers mapped on 2 different maps, including erroneous paralogs numerous spurious correspondances generated by colocalized redundant markers on the same map need for a curation of duplicates both internally and between maps Map A Map B Map A Map B Map A Map B all inversions between all possible pairwise comparisons had to be solved [ 5 ] [ 6 ] [ 7 ]

2- A High Density Consensus map Marker redundancy and map curation resulted in a decrease of the number of bridges (5578 between instead of 10086) No bridges (same chromosome) Before redundancy curation Map T3 GV PK TH E3 CH Total T3 105 422 63 408 837 351 2081 GV 37 221 350 323 329 1645 PK 76 69 64 79 496 TH 66 929 283 2039 E3 193 315 2468 CH 31 1357 After redundancy curation T3 GV PK TH E3 CH Total 13 258 37 226 398 180 1099 17 179 196 161 178 972 75 34 35 39 324 19 583 142 1181 32 143 1320 12 682

2- A High Density Consensus map Map code After map curation No unique loci No loci for map integration Total T3 2072 677 1977 GV 1745 778 1671 PK 2584 1352 2559 TH 2247 796 1892 E3 2316 480 1814 CH 1080 33 417 Total 12044 4116 10330 Map integration (using Biomercator v3) With 2 maps, GV and T3, given «higher confidence», a HDC map was build in 2 steps: Step 1: integration of GV and T3, no reference (result=gvt3) Step 2: iterative projections of PK, TH, E3 and CH with GVT3 as fixed reference

2- A High Density Consensus map Step 1 (Biomercator): integration of GV+T3, 2 maps of 1st priority (258 bridge loci) GVT3 (3538 cm, 3374 loci) GV c9 T3 c9 GV c23 T3 c23

2- A High Density Consensus map Step 2 (Biomercator): Consensus of GV+T3 used as «fixed» framework for projecting markers from 4 other component maps c9 c23

2- A High Density Consensus map HDC map represents: c8-c24 homeoduplicates 8254 loci (317/chr) 6669 «unique» markers, 3450 SSRs (727 as CLU ), 1894 RFLPs 4070 cm, 2 loci/cm a single dense region per chrom. 1292 multi-copy markers, 64% homeoduplicates 4744 unique markers of the HDC map are informed by a sequence (genomic, cdna)

3- Convergence HDC map / G raimondii alignment of the sequences of : - the 4744 markers mapped on the HDC map and - the 13 scaffolds (750 Mb) of G raimondii (www.phytozome, v2.1, Sept.2012) using BBMH, Best Blast (Bidirectional) Mutual Hit approach, (BLASTN, 1e-5) the 3607 single hits (4592 loci) included 2182 hits (2369 loci) for the D chromosomes of HDC map (c14-c26 ), or 182 anchor loci per chromosome The sequences originate from different ploidy/evolutionary context (1) from a tetraploid (markers on the tetraploid HDC map) and (2) from a diploid context (G raimondii genome)

3- Convergence HDC map / G raimondii excellent convergence ratio kb/cm highly variable (average 400 kb/cm) ; S-shaped curve (high ratio in centromeric regions) cross-validatiion of the 2 data sets no apparent structural re-arrangement following polyploidization G. raim Chr13 vz HDC c18 139 hits

after clustering (5 colinear pairs): possible incongruency an internal region of Gr_Chr08 with not hit on genetic map G. raim Chr08 vz HDC c26

after clustering (5 colinear pairs) : possible incongruency dual correspondance of Gr_Chr09 with c19 and c22? G. raim Chr09 vz HDC c19 G. raim Chr09 vz HDC c22

last but not least New data on TropGeneDB/Cmap (www.tropgenedb.cirad.fr) E3 c11 HDC E3 c11 T3 c11 HDC c11 T3 c11 HAU1430 = CLU1057 = STV069 CLU markers serve as bridges between genetic maps (thanks to aliases/feature names associated with markers) IN ADDITION : added (TropGeneDB, Cmap) the G raimondii v2.1 physical map and correspondances/hdc more QTLs (metaqtls, expression eqtls hotspots) created links from markers (Cmap) to a genome browser of G raimondii

From QTL on any component map To HDC and physical maps (list of) candidate gene(s) in QTL confidence interval tropgenedb.cirad.fr/ Any request for additional map and QTL data to be included in TropgeneDB, email to: marc.lacape@cirad.fr or chantal.hamelin@cirad.fr To genome browser (track «genetic marker») and annotations Perspective: exchange with CottonGen

This research is published in PLosONE: Acknowledgments and Contributors Acknowledged are - Andrew Paterson and JGI for early access to G raimondii scaffolds - Johann Joets and Olivier Sosnowski for early access to Biomercator v3 Contributors: - Anna Blenda (Clemson Un.) - David Fang (ARS) - Feng Luo (Clemson Un.) - Jean-François Rami (Cirad) - Olivier Garsmeur (Cirad) - Chantal Hamelin (Cirad) - Gaëtan Droc (Cirad) Thank you, Merci...