New era for molecular breeding with cost effective SNP genotyping solutions Dr. Bhaswar Maity Imperial Life Sciences 18/2/2015 ICRISAT

Size: px
Start display at page:

Download "New era for molecular breeding with cost effective SNP genotyping solutions Dr. Bhaswar Maity Imperial Life Sciences 18/2/2015 ICRISAT"

Transcription

1 New era for molecular breeding with cost effective SNP genotyping solutions Dr. Bhaswar Maity Imperial Life Sciences 18/2/2015 ICRISAT

2 DNA Sequencing to Genomic Selection

3

4 Plant Genomes up to 2013 Incomplete: Average completeness: 85% Fragmented: Average contiguity: 20 kb Michael & Jackson (2013) The Plant Genome 6: 1-7.

5

6 Hyderabad was known originally as????? Hyderabad was known originally as Bhagyanagar, a city Sultan Muhammad Quli of the Qutub Shahi dynasty had founded and named after his beloved Bhagmati or Bhagyamati in Once she entered the royal household and embraced Islam, she was rechristened Hydermahal and as a natural consequence, the city got its second name, Hyderabad.

7 Why does everybody want longer reads?

8 P6-C4: Read Length Performance N50 Read Lengths: >15 kb 95 th Percentile: >20 kb Maximum Read Lengths >40 kb Throughput / SMRT Cell >800Mb P6-C4, 4 hr movie, 20 kb BluePippin size-selected E coli (1 SMRT Cell)

9 200,000+ SNPs Were Missed In Short-Read Assemblies Mapping of Illumina PE or PacBio Assemblies to TAIR 10 Ler0 ILMN PE PacBio Ler0 Assembly In collaboration with Joe Ecker at Salk Institute for Biological Studies 27,106 Cvi ILMN PE 55, ,836 95%/68% 685,104 92%/72% 238,637 PacBio Cvi Assembly 271,335 Not only did PacBio discover pretty much everything that Illumina paired-end reads was able to find, in this case 95% and 92%, it identified another 250,000 of these variants Chongyuan Luo, Ph.D The Salk Institute for Biological Studies Resolving the Complexity of Genomic and Epigenetic Variations in Arabidopsis PAG 2014 Workshop blog.pacificsciences.com

10 Resolve Gene Duplications in Difficult BACs Aluminum tolerance in maize is important for drought resistance and protecting against nutrient deficiencies Segregating population localized a QTL on a BAC, but unable to genotype with short-read sequencing because of high repeat content and GC skew BAC assembly with PacBio long reads revealed a triplication of the ZnMATE1 membrane transporter Genomic organization of the MATE1 locus Maron, LG et al. (2012) A rare gene copy-number variant that contributes to maize aluminum tolerance and adaptation to acid soils. PNAS

11 Resolve Difficult Genomic Regions Novel patterns of higher-order repeat structures in Switchgrass centromeres: Melters et al. (2013) Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biology, 14:R10

12 PacBio s Iso-Seq Method for High-quality, Full-length Transcripts Experimental Pipeline cdna synthesis Size partitioning & SMRTbell PacBio RS II with adapters PCR amplification a ligation Sequencing Experimental pipeline Informatics 5 pipeline AAAAA TTTTT PolyA mrna AAAAA AAAAA AAAAA AAAAA SampleNet: Iso-Seq Method with Clontech cdna Synthesis Kit Informatics Pipeline Remove adapters Remove artifacts Raw AAAAA TTTTT AAAAA TTTTT AAAAA TTTTT AAAAA TTTTT Nonredundant PacBio raw Clean sequence Isoform transcript Final isoforms sequence reads reads clusters isoforms DevNet: Iso-Seq wiki page Reads of Insert Reads clustering Consensus calling AAAAA TTTTT AAAAA TTTTT AAAAA TTTTT polya mrna 5 primer Coding sequence AAAA polya TTTT AAAA tail 3 primer TTTT (AAA) AAAA n TTTT (TTT) SMRT adapter n Size partitioning SMRT & adapter (TTT) n PCR amplification AAAAA AAAAA AAAAA AAAAA AAAA TTTT (AAA) n AAAA TTTT AAAAA cdna synthesis with adapters AAAA TTTT AAAA TTTT AAAA TTTT SMRTbell ligation RS sequencing Quality filtering b Map to reference genome PacBio raw sequence reads Clean sequence reads Isoform clusters Nonredundant transcript isoforms Final isoforms Remove adapters Remove artifacts Reads clustering Consensus calling Quality filtering Evidenced-based gene models Map to reference genome Evidence-based gene models

13 Axiom Genotyping Arrays Best for your breeding program Now and tomorrow

14 Largest portfolio of catalog arrays in Agri-genomics 22 Catalog Designs 14

15 15 Axiom Genotyping Publications

16 Routine Use 384-format Robust, Med density, Very High Throughput Cost Effective Assay for routine use across all animals Breeder arrays Axiom 384-format High throughput ideal for genotyping applications in breeding Improves imputation accuracy Low cost Key Features: ~50,000 markers per array 3,000+ samples/week throughput

17 Demystifying Expert Array Designs Genotyping microarrays for 1, ,000 SNPs and indels Pre-designed arrays Catalog (off the shelf) Expert Designs (custom arrays available to anyone) mydesign custom arrays Any species 384-array plate 96-array plate 1,500-50,000 markers 50, ,000 markers Custom arrays are the largest revenue drivers of the Ag genotyping business!

18 multiplexing Low High Price/Sample MyDesign Arrays Span All Application Needs 675K -200K markers 200K 90K markers 480 sample minimum volume 90K 50K markers 50K-1.5K markers Discovery Genotype-trait association Selection Screening More SNP Content for No Additional Cost Same prices within SNP tiers More content at no additional cost 675K 200K 90K 50K 1.5K SNP Array Density

19 Axiom catalog and custom designs Catalog designs Custom designs Plants Wheat Maize Rice Soybean Lettuce Pepper Apple Strawberry Rose Cotton Bovine Buffalo Chicken Equine Goat Mouse Ovine Porcine Rat Salmon Trout Turkey Animal Animals Bovine Barley Crops, Buffalo vegetables, fruits Apple & trees 26+ Carp species Catfish Farm Chicken animals 8+ species A. Aegypti Eel Aquaculture Great tit 8+ species Herring Mouse Pig Rat Salmon Sea lice Trout Yellow tail Plants Alstroemeria Lily Chrysanthemum Japanese Cedar Lettuce Maize Potato Rice Rose Rye Sorgum Soybean Strawberry Sunflower Tobacco Tomato Wheat Brassica Rapeseed Pine Spruce Cabbage 19

20 Axiom Expert Design arrays Test drive the experience: Available to all! >200k 90k-200k 70k-90k 50k Bovine Maize Salmon Soybean Buffalo Strawberry Trout Cotton Chicken Equine Wheat Apple Q Wheat Axiom 384 Trout Rose Goat Q Ovine Q Rice 44k Q Porcine Q Maize 50k

21 Screening Arrays: Validate sequencing discoveries Selection of markers using insilico design Validate discovery across multiple samples Sub-select makers Add new markers Select markers from multiple breeds 21

22 Unique Advantage: Multi-Species format Unique Advantage: Multi-Species format Greater Flexibility- choose cost effective fast solution Rice Chick Pea Pigeo n Pea Maize

23 Fast track genomics selection using arrays GBS experiments take longer to run, data analysis requires bioinformatics staff and can take weeks/months to complete. 23

24 GBS Missing data 40%-60% missing data 30-50% errors in calling hets < 1000 markers common between samples Relationship between amount of markers available using genotyping by sequencing techniques, proportion of missing data and cost per sample as observed with wheat. (doi: /plantgenome)

25 No beads! No Missing SNPs 100% custom content on Axiom, 0% SNP drop out Affymetrix array manufacture process ensures no batch effects Customer content is present on the array, EVERY time Initial Manufacture Event Subsequent Manufacture events First bead pool ~5% bead loss from pool Bead based Arrays Bead pool 2 Bead pool 3 Common SNPs across 3 bead pools % drop Bead-based array experiences ~5-20% difference in SNP content between different batches For example: see Eeles et al., Nat Genet 40: (2008).

26 Polyploidy & informatics What is it? More than 2 sets of chromosomes (humans are diploid = 2 sets) Why do we care? Non-human species can have complex genomes and varying numbers of ploidy. Polyploidy is especially common in plants. Wheat is hexaploid What are Affy s capabilities? Genotyping polyploids is complex as it causes cluster compression. Diploid (Bovine) Polyploid (wheat) Strawberry is octoploid! *Axiom is capable of genotyping allopolyploids and SOME autopolyploids Axiom is the only high density genotyping platform that can automatically call genotypes from polyploid species!

27

28

29 Automated SNP calls for Polyploid Crops

30 Axiom genotyping categories Setting the gold standard for displaying results

31

32

33 Sr. No. Features Affymetrix GeneTitan Axiom Technology Technology X Arrays are synthesized using highly sensitive photolithographic manufacturing 1 technology. Flexibility to select SNPs from Arrays Production Affymetrix database, user defined SNPs, Consistency SNPs of initial array and redesign array at very high conversion rate, minimizing batch-to-batch variability. 2 Arrays Supply Support for polyploid Genomes Marker selection freedom Minimum commitement for Customization for future arrays/ versions 100% identical SNP content at any time and for as long as research necessitates. Continuous supply of plates and can be ordered any time on with 100% reproducibility among batches. Automated genotype-calling algorithm which performs analysis of diploid and polyploid genomes without the need for manual data editing. Axiom Assay tolerates a single base-pair mismatch outside of 10b window from the SNP interogation site Only 480 arrays % batch-to-batch SNP Loss from Bead Pool Inconsistent supply of arrays. bead-pools expire in 12 months. Manual genotype checking and correction is required. This is labor intensive and time consuming, as it can require a whole day for every 1,000 SNPs that require checking. This is a considerable analysis burden, even for a lowdensity SNP panel. Infinium Assay does not support interfering SNPs within 60 bp of SNP of interest Minimum 1152 arrays

34 Other Platform Axiom Features No SNP dropouts Semi-conductor based photo-lithographic technology All designed markers are accessible More Marker Selection Flexibility Compatible with Interfering SNPs 10 bp away form candidate SNP Candidate SNP Neighboring SNP Proven INDEL calling Multi-Species Design Automated Genotyping Analysis Automated analysis of diploid and polyploid organisms No manual editing required Missing valuable data Inability to target specific markers Tedious Manual Analysis

35 MassARRAY Complements High-throughput Sequencing & Genomic Selection de novo Discovery Novel SNPs Somatic Mutations RNA-Seq Meth-Seq CNVs Validation & Translation Custom Assays Multiplexing QC/Tracking High-throughput Low cost The MassARRAY System is for Research Use Only. Not for use in diagnostic procedures Improving healthcare through revolutionary genetic analysis solutions

36 Agena MassARRAY Biochemistry Processes Miniaturized SpectroCHIP Mass Spectrometry Data Analysis Packages Genotyping Methylation Analysis Quantitative Gene Analysis Comparative Sequence Analysis All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures..

37 Distinct advantages Of MassARRAY for Nucleic Acid Analysis 1. We don t use fluorescence Mass of the actual bioanalyte is detected - 4 decimal place accuracy No non-specific background issues background is a different mass Incredible sensitivity push PCR amplification to the max single molecule detection possible 2. system is quantitative Many biological phenomena need to be accurately quantified Allele ratios, gene copy number, gene expression, methylation 3. We have an ability to multiplex due to wide mass window and high resolution detector Provides high throughput Simple and flexible assay design with little optimization required Cost savings 4. The system is very flexible Small, medium and large scale studies Numbers of samples and markers are easily scaled Simple assay design and ordering of reagents Comprehensive Genetic Analysis >> Genetic SNP, Transcriptome Gene Expression, Epigenetic methylation 37

38 Adopted by leading centers engaged in basic, translational, clinical, and agricultural research 300+ systems worldwide peer-reviewed publications to date 800k+ samples and ~1.2B genotypes analyzed on the MassARRAY system in the past year. 38 All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures.

39 iplex Gold Genotyping: Rapid and Easy Workflow All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures..

40 iplex Biochemistry SNP Genotyping & Mutation Detection iplex Gold for general research genotyping Up to 40-plex reactions as standard Low cost per genotype User assay design or Assays by Agena High assay design yield (+90%) with high genotyping call rates (+98%) High accuracy published 99.7% Assay requires low cost plain oligos Can handle insertions/deletions (complex mutations) Small -Medium to High throughput Capability 24 or 96 or 384 well plate format 100s to 1000s of samples per day Very flexible Incorporating new assays & re-plexing Flexible study design in 384 or 96 well formats Rapidly design new assays and order oligos 40

41 42 Thank You

42 The majority of wishes by devotees are visa related, thus Chilkur Balaji is also referred to as 'Visa' Balaji. 43