International Training Course on Maize Molecular Breeding April 5 16, 2010, CIMMYT, El Batan, México. ccmaize

Size: px
Start display at page:

Download "International Training Course on Maize Molecular Breeding April 5 16, 2010, CIMMYT, El Batan, México. ccmaize"

Transcription

1 International Training Course on Maize Molecular Breeding April 5 16, 2010, CIMMYT, El Batan, México Choice of Marker Systems and Genotyping Platforms Yunbi Xu International Maize and Wheat Improvement Center CIMMYT, Mexico

2 Genetic Markers Genetic markers are biological features that are determined by allelic forms and can be used as experimental probes or tags to keep track of an individual, a tissue, a cell, a nucleus, a chromosome, or a gene (Xu, 2010) Morphological markers Genetic stocks with difference in color, size, etc Cytological markers Various aneuploidy, variants of chromosome structure, and abnormal chromosomes Protein markers Isozymes = differ in amino acid sequence but catalyze the same chemical reaction DNA markers Variation at DNA levels: RFLP, AFLP, RAPD, SSR, DArT, SNP etc

3 Enzyme restriction site Tandem repeat sequence PCR primer Molecular Basis of Major DNA Markers

4 DNA Markers and Major Molecular Techniques Southern blot based markers Restriction fragment length polymorphism, RFLP Single strand conformation polymorphic RFLP, SSCP-RFLP Denaturing gradient gel electrophoresis RFLP, DGGE,-RFLP PCR-based markers Randomly amplified polymorphic DNA, RAPD Sequence tagged site, STS Sequence characterized amplified region, SCAR Random primer-pcr, RP-PCR Arbitrary primer-pcr, AP-PCR Oligo primer-pcr, OP-PCR Single strand conformation polymorphism-pcr, SSCP-PCR Small oligo DNA analysis, SODA DNA amplification fingerprinting, DAF Amplified fragment length polymorphism, AFLP Sequence-related amplified polymorphism, SRAP Target region amplified polymorphism, TRAP Insertion/Deletion polymorphism, Indel

5 DNA markers and related major molecular techniques Repeat sequence-based markers Satellite DNA (repeat unit containing several hundred to thousand bp) Microsatellite DNA (repeat unit containing 2-5 bp) Minisatellite DNA (repeat unit containing more than 5 bp) Simple sequence repeat, SSR or SSLP Short repeat sequence, SRS Tandem repeat sequence, TRS mrna-based markers Differential display, DD Reverse transcription PCR, RT-PCR Differential display reverse transcription PCR, DDRT-PCR Representative difference analysis, RAD, Expression sequence tags, EST Sequence target sites, STS Serial analysis of gene expression, SAGE Single nucleotide polymorphism based markers Single nucleotide polymorphisms

6 Restriction Fragment Length Polymorphisms A 1 A 2 DNA extraction A 1 A 2 Radioative autograph A 1 A 2 Restriction fragments Wash A 1 A 2 Agarose gel electrophoresis Hybridization DNA denaturing Southern blotting RFLP workflow from DNA extraction to radio-autograph

7 Restriction Fragment Length Polymorphisms Need a lot of high quality DNA Expensive, slow, laborious Cannot be mechanized or scaled up Require much discovery and optimization for each species before use Have multiple alleles (co-dominant) and are very repeatable

8 Amplified Fragment Length Polymorphisms 5 GAATTC TTAA CTTAAG AATT Whole genome DNA Restriction AATTC G Ligation AATTC TTAAG Pre-amplification A AATTCN TTAAGN Selective amplification + EcoRI and MseI + TA T AAT TTA AAT TTAA EcoRI Adapter MseI Adapter + 5 A EcoRI Primer 1 C 5 MseI Primer 1 * NTTA NAAT C 5 5 GA EcoRI Primer 1 CA 5 MseI Primer 1 5 * GA AATTCN TTAAGN Electrophoresis AFLP flowchart NTTA NAAT CA 5

9 Amplified Fragment Length Polymorphisms Combine restriction, adapters, and PCR to generate many polymorphisms per run Needs a small amount of very high quality DNA Expensive, slow, laborious Cannot be mechanized or scaled up Do NOT require much discovery and optimization for each species before use Do not have multiple alleles (dominant), but you get many loci per run

10 Simple Sequence Repeats, SSR, Microsatellites PCR primers designed to amplify the repeat regions to show the repeat length polymorphisms Need a small amount of medium quality DNA, but for characterization studies, should be high quality PCR based, so cheap and easy to run Require much discovery and optimization for each species before use Do have multiple alleles (co-dominant) and are very repeatable

11 156 Agarose gel-based SSR genotyping PAGE gel-based SSR genotyping Semi-automated SSR genotyping Automated SSR genotyping using fluorescent labelling Stutter bands and multiple alleles Examples of genotyping systems used for SSR analysis

12 A B Procedure of Diversity Array Technology (DArT) Generating the array Genotyping the sample

13 Diversity Array Technology DArT Relies on hybridization to previously defined clones Needs a small amount of very high quality DNA Can be mechanized and you get many loci per run Do not have multiple alleles (dominant) Require much discovery and optimization for each species before use Require special equipment to visualize; generally better to outsource

14 Single Nucleotide Polymorphisms (SNPs) Are a change in the sequence of one base in the DNA (or, occasionally, a small insertion or deletion in the sequence of the DNA) Are the most basic polymorphism type; you are looking at the ACTUAL change, not a change in amplification or restriction pattern caused by it Need a small amount of medium to high quality DNA Can be mechanized, and are many different ways to visualize the differences

15 Single Nucleotide Polymorphisms (SNPs) Do not have multiple alleles (two codominant alleles) Require much discovery and optimization for each species before use Require special equipment to visualize; often better to outsource If many SNPs are run simultaneously, the cost per SNP is many times cheaper than any other marker type May be functional markers (within genes of interest)

16 Enzyme Chemistry Demultiplexing Detection Method Platform/Company Allele-Specific Extend + Ligate Oligonucleotide Ligation Assay Single Nucleotide Primer Extension Allele-Specific Hybridization Allele-Specific PCR Semi-Homogen. Solid phase microspheres Homogeneous Capillary Electrophoresis Solid phase microarray Fluorescence Mass Spectrometry Fluor Res Energy Transfer-FRET Fluorescence Polarization Illumina BeadArray TM Luminex 100 Flow Cytometry Sequenom iplex TM Mass Spec. ABI SNPlex TM Microarray Minisequencing ABI Taqman TM 5 -Nuclease ABI SNaPShot TM DASH, Amplicon T m Perkin-Elmer FP-TDI Chemistry, demultiplexing, detection options in SNP genotyping.

17 Comparison of the five widely used DNA markers in plants RFLP RAPD AFLP SSR SNP Genomic coverage Low copy coding Whole genome Whole genome Whole genome Whole genome region Amount of DNA required 50-10μg 1-100ng 1-100ng ng 50ng Quality of DNA required High Low High Medium high High Type of polymorphism Single base Single base Single base Changes in length Single base changes, indels changes, indels changes, indels of repeats changes, indels Level of polymorphism Medium High High High High Effective multiplex ratio Low Medium High High Medium to high Inheritance Codominant Dominant Dominant/ codominant Codominant Codominant Type of probes/primers Low copy DNA or cdna clones Usually 10 bp random nucleotides Specific sequence Specific sequence Allele specific- PCR primers Technical demanding High Low Medium Low High Radioactive detection Usually yes No Usually yes Usually no No Reproducibility High Low to medium High High High Time demanding High Low Medium Low Low Automation Low Medium High High High Development/start-up cost High Low Medium High High Proprietary rights required No Yes and licensed Yes and licensed Yes and some licensed Yes and some licensed Suitable utility in diversity, genetics and breeding Genetics Diversity Diversity and genetics All purposes All purposes

18 Perfect Markers for Genotyping Cheap to run, or gives a lot of information per run MUST be very repeatable between assays, people, machines, and laboratories Low errors (unambiguous to score) Many alleles good, not many rare alleles (high information content) High-throughput and automated genotyping systems Flexibility in data scoring, management and analysis

19 Genotyping Components Sampling tissues for DNA extraction Isolation of DNA Digestion, hybridization, and/or amplification of DNA into specific fragments Sizing, separating or distinguishing DNA composition, fragments, allele combinations or patterns Data management and analysis

20 Sampling Tissues for DNA extraction Tissues: leaf, seed, culture products Sample treatment (drying, grinding, etc) Sample storage Sample labeling Sample tracking Requirements for small and large samples are very different. For a large number of samples, high throughput and automation systems are required

21 Isolation of DNA High throughput DNA extraction systems Automated liquid handler Tissue sampling: seed chipping technology Tissue homogenization Plate-based Multichannel pipette Purification

22 Marker Assay Digestion, hybridization, and/or amplification of DNA into specific fragments Sizing, separating or distinguishing DNA composition, fragments, allele combinations or patterns Methods Gel-based Capillary-based Array-based Sequence-based

23 Gel-based Marker Assay Agarose Gel Polyacrylamide Gel Electrophoresis (PAGE) Capillary-based Marker Assay Capillary electrophoresis (CE), also known as capillary zone electrophoresis (CZE), can be used to separate ionic species by their charge and frictional forces and mass. In traditional electrophoresis, electrically charged analytes move in a conductive liquid medium under the influence of an electric field. CE was designed to separate species based on their size to charge ratio in the interior of a small capillary filled with an electrolyte.

24 Array or Chip-based Marker Assay DNA microarray is a multiplex technology used in molecular biology. It consists of an arrayed series of thousands of microscopic spots of DNA oligonucleotides, called features, each containing picomoles (10 12 moles) of a specific DNA sequence, known as probes (or reporters). This can be a short section of a gene or other DNA element that are used to hybridize a cdna or crna sample (called target) under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. Since an array can contain tens of thousands of probes, a microarray experiment can accomplish many genetic tests in parallel. Therefore arrays have dramatically accelerated many types of investigation.

25 A flowchart for a general microarray process EST database or cdna library PCR inserts From EST clones Multi-well plates Spotting Treatment 1 Treatment 2 RNA 1 RNA 2 Cy5-cDNA 1 Cy5-cDNA 2 Laser Scanning Dry Hybridization Wash

26 Treatment 1 Treatment 2 Treatment 1 Treatment 2 Treatment 1 Treatment 2 A flowchart for a general microarray process

27 Sequencing-based Marker Assay Sequencing a basic way to identify SNPs Skim sequencing: ~3X coverage or partial random sequencing of a largeinsert clone Deep sequencing depth of coverage and number of times: ~ 8X Sequencing-based SNP discovery Skim-sequencing all germplasm Deep-sequencing all germplasm

28 Data Management and Analysis Information Collection Information Integration Data standardization Development of generic databases Use of controlled vocabularies/ontologies Interoperable query system Redundant data condensing Database integration Tool-based information integration Information retrieval and mining Information management systems

29 Decision Support Tools Germplasm management, evaluation, and enhancement Breeding population management and improvement Building up heterotic patterns Prediction of hybrid performance Marker-assisted inbred and synthetic creation Genetic map construction Marker-trait association identification and validation Marker-assisted selection methodologies and implementation Genotype by environment interaction analysis Intellectual property right and plant variety protection Breeding design through simulation and modeling

30 LIMS and Analytical Tools for Genetic Improvement Data Tools Output Genotype Sequences Markers Maps Genealogy Phenotype Yield Quality Agronomy Stress response Environment Water Fertilizer Soil Temperature Precipitation GIS Day length BLASTN/X Mapmaker MultiQTL GeneFlow QTL Cartographer SAS/JAMP Structure GeneMapper PowerMarker Arlequin BiPlot CMTV.. ICIS Gene functional analysis Genetic diversity Germplasm evaluation Germpalsm classification Variety identification Genetic mapping Marker-trait association Marker-assisted selection GXE interaction Environmental classification Variety stability/adaptability Integrated IMS for molecular breeding

31 More about SNP Genotyping Platforms Chemistry or allele discrimination techniques Sequencing, Restriction enzyme digestion Direct hybridization, Primer extension Allele-specific amplification Detection systems Gel electrophoresis, Capillary electrophoresis Flow cytometry, Fluorimetry Mass spectrometry Real time PCR, arrays or chips, etc Reaction formats Solution phase, solid support, bead arrays The best suitable technology should consider: Sensitivity Reproducibility Accuracy Capability of multiplexing Cost effectiveness Flexibility for uses

32 High-throughput SNP Genotyping Platforms GeneChip microarray technology from Affymetrix The high-density mapping arrays produced by Affymetrix can simultaneously genotype about 500,000 manufacturer selected SNPs per array. Genotyping a comparable number of user-selected SNPs would require an expensive and timeconsuming redesign of array probes as well as a difficult reengineering of the DNA amplification protocol. BeadArray technology from Illumina Inc. The BeadArray technology combines a miniaturized array platform with a high level of assay multiplexing and scalable automation. The system uses a high-density BeadArray technology in combination with an allele-specific extension, adapter ligation and amplification assay protocol that achieves high multiplexing in a fully integrated production environment. The multiplexed assay detects up to 1536 SNPs in a single DNA sample and allows a researcher to determine over 140,000 genotype calls simultaneously. Infinium Whole Genome Genotyping assay: the Maize Infinium 60K Array

33 High-throughput SNP Genotyping Platforms KBiosciences PCR SNP genotyping system It utilizes a unique form of allele specific PCR. It offers the simplest and most cost effective way to determine SNP genotypes in the laboratory. It is flexible with ability to perform direct or indirect assays. Works well in 96, 384 or even 1536-well plate formats. Assays can be easily designed and optimized by end-users using Primer Picker free software. It is compatible with many detection instrument platforms from conventional plate readers to ABI Prism instrumentation. KasPar SNP option of KBiosciences seems lowest-cost option requiring genotyping of 10 or fewer loci. The cost would be about 16 euros per locus for samples, plus DNA extraction.

34 Next Generation Sequencing Chip-based SNP genotyping systems remain expensive for high-density genotyping on a large scale (ie breeding versus genetic analysis) Application of next-generation (NG) sequencing to maize and sorghum genotyping is well-advanced. The costs will permit all lines in the breeding program to be genotyped, and genomic selection (GS) to be implemented on a large scale. NG sequencing based on Illumina GA can do 96 samples for $4000. Multiplexing 10X could drop per sample reagent cost to $4-5. It will generate over 1 million SNP and small INDELs. No need for prior SNP discovery, but the technology has not yet been proven. The platform requires well-curated inventories, pedigrees, unique GIDs, user friendly informatics platform, decision support tools, sample tracking system, automated allele calls.

35 Marker-Assisted Breeding Platforms I. Major gene introgression (target genes only) 2-10 markers for each trait Single trait introgression Multiple trait introgression A few markers for hundreds of plants Taqman genotyping system Kbiosciences II. Marker-assisted backcrossing (target genes plus background) 2-10 markers for each trait markers for background selection A few hundreds of markers for hundreds of plants Illumina genotyping system Kbiosciences III. Whole genome or genomewide assay 500 to several thousands or millions of markers for hundreds or thousands of plants/lines Illumina genotyping system KBiosciences

36 Maize SNP chips in use at CIMMYT and under development Two 1536 Cornell chip giving good repeatability, high rate of successful calls. The cost About $85 per DNA sample, including reagents and Cornell service charge. >800 SNPs with high quality can be used in F2 populations. Cornell and Illumina etc have developed a high-density chip with ca 60K SNPs. It costs about $160/sample.

37 Genotyping Services Advent of SNP markers requires labs with high tech equipment (robotics, automated SNP platforms, LIMS). Large private breeding companies (except for very biggest) have moved to contract genotyping in specialized labs. Quality control, throughput, redundancy of best service providers is excellent. GCP Molecular Breeding Platform has negotiated very good prices with several labs, and can handle contract details and enforcement. For the best service provider, quoted costs relevant to DTMA MARS projects are approximately $0.07-$0.08 per data point, including DNA extraction and assay development. The cost of genotyping 200 polymorphic loci would therefore be about $15 per line. When equipment costs are factored in, no in-house system is competitive.

38 References Xu, Y Molecular Plant Breeding: 2. Molecular breeding tools: markers and maps Classical markers DNA markers RFLP RAPD AFLP SSR SNP DArT Genic and functional markers 3. Molecular breeding tools: omics and arrays 3.6 Array tecnologies in omics 14. Breeding Informatics 15. Decision support tools