Anker P Sørensen Crop innovation through novel NGS applications

Size: px
Start display at page:

Download "Anker P Sørensen Crop innovation through novel NGS applications"

Transcription

1 nker P Sørensen Crop innovation through novel NGS applications Session 1: NGS / Omic Technologies for Plant Research 19 e March 2015, Kuala Lumpur Malaysia

2 KeyGene s corporate profile Wageningen NL Rockville US Shanghai China Inspiring partnerships Communicative & Connective Creative & unconventional problem solver Co-development & entrepreneurial

3 25 years of Successfull Partnerships Strategic shareholders Long-term partnerships

4 Crop Innovation with KeyGene Genomic Tools & Methods to harvest the promise of plant genes Fungal disease Drought Bacteria Salinity Virus Quality & Health Insects Non-Food

5 Crop Innovation with KeyGene Sequence based breeding Crop Specific Tools Trait Elucidation Trait Prediction Trait Improvement Genome Sequencing Transposon Screening proprietary sequence- Based technologies & products Marker Discovery Genotyping 4x 1x 3x Mutation Screening Transcript. Sequencing/ Profiling mplicon Sequencing Whole Genome Profiling 1x c

6 DN reference sequence of Crops Tomato Potato Cucumber Melon Pepper Water melon Brassica rapa Brassica oleracea Lettuce Chickpea Rice Sugar beet Wheat Peach Chinese plum Peer pple Papaya Orange Data palm Banana Bean Pigeon pee Cassava Oilseed rape Poplar Hevea Jatropha Castor bean Soybean Medicago Corn Sorghum Sunflower Flax Cotton Mose bamboo Cacao Barley Grape Coffee Tobacco Strawberry

7 Whole Genome Profiling The Benefits 10 X BC library BC contigs & singleton BCs. verage N50 contig sizes ~ 2-4 Mbp - sequence-based physical map. Direct access to individual BC clones - BC contig provide sequence-based anchor points for assembly of "XXL" scaffolds - single or low-copy. Repeat sequences eliminated by deconvolution

8 WGP Track record Species Genome size (Mbp) Combination with WGS rabidopsis Strawberry Wheat chromosome arms Cucumber Melon B. rapa Carrot 470 Radish 573 B. oleraceae Keygene and mplicon Express have generated over 40 physical maps for 27 different species of varying genome size (130 Mbp 5000 Mbp) and complexity. Potato Tomato Eggplant 1000 B. napus Cotton 1700 Lettuce Pepper Sunflower Tobacco Wheat genome

9 WGP2.0 Melon: Integration PacBio assembly with WGP 2.0 BC contigs Integration PacBio de novo assembly with 6x to 12x WGP contigs De novo Integration of de novo assembly with WGP assembly Parameter PacBio (44x) 6x 8x 10x 12x Total contigs/superscaffolds 6, Total bases 400,924, ,510, ,893, ,696, ,650,309 Largest size 2,492,087 7,809,249 10,427,784 9,373,185 7,987,312 verage size 63,317 1,286,017 1,510,451 1,644,480 1,658,497 Smallest size ,501 31,655 34,501 39,515 N50 size 387,751 2,108,034 2,379,643 2,563,392 2,686,294 N50 index % integrated bases 87.30% 86.30% 85.70% 85.20% Integrated WGS contigs Level of improvement # contigs 23x < 27x < 30x < 31x < N50 size 5.4x > 6.1x > 6.6x > 6.9x >

10 Cotton Genome Sequencing Cotton: integration PacBio assembly with WGP 2.0 BC contigs contigs superscaffolds Input PacBio 38x coverage Number of Contigs 21,777 Total Bases 2,028,005,861 Total Bases without N's 2,028,005,861 Largest Sequence 2,328,770 verage Sequence 93,126 N50 218,948 Input PacBio 38x coverage + 10x WGP BC contigs Total superscaffolds 1,207 Total WGS bases integrated 1,692,969,385 Num WGS contigs integrated 14,789 Max superscaffold size 12,561,139 verage superscaffold size 1,402,626 Minimum superscaffold size 8,509 N50 Size 2,329,392 KeyGene draft genome sequencing of tetraploid G. hirsutum 38x coverage PacBio SMRT sequencing 10x Whole Genome Profiling (WGP )

11 454 sequencing ssembly of BCs using PacBio only BCs kb ; contig ~880 kb 8 overlapping BCS, each sequenced in one SMRT cell - Each BC was assembled into a single contig - Single, contiguous sequence was derived from overlaps single PacBio contig

12 ssembly of fungal genomes using PC-Bio only T. emersonii (~ 30 Mb) Sequenced in 16 SMRT cells ssembled into 8 long contigs spanning ~30 Mb - 7 additional contigs: contaminants and over-assembled repeats

13 Towards finished plant genomes Targeted gap filling KeyGene method 1 Mbp 37 Mbp Gaps= 64 x Maize Region on chromosome 4 Size = 1Mbp Target region: 64 gaps selected Gaps are artificially set at 100 bp 38 Mbp

14 Towards finished plant genomes Targeted gap filling results Model: maize, chr. 4 region 1 Mbp, 64 gaps artificially set at 100 bp Results: 100 bp gap filled at position ~63,500 in the 1 Mbp region 1.4 Gbp data (727 k PacBio reads); PBJelly gap analysis (single pass): 25 filled (closed or reduced); 18 overfilled; 5 extended; 16 no fill / failed

15 Multi Genome Variation

16 Multivariome analysis Re-sequenced germplasm collections 150 tomato lines SNP distribution Chr. 1 CNV of 19 genes Chr. 1 Visualize selected SNPs, Indels & CNVs

17 Whole Genome Genetics nnotated SNPs of Re-sequenced germplasm collections SNPs aligned to WGS Chr1 Chr2 Variation difference in subpopulation Cultivated Wild Chr7

18 Whole Genome Genetics nnotated SNPs of Re-sequenced germplasm collections Gene haplotype variation in subpopulation and/or generations elite wild elite wild

19 Δ freq Whole Genome Genetics nnotated SNPs of Re-sequenced germplasm collections Variation in haplotype frequency shifts Physical distance Large shift in allele frequency indicate selections

20 nr. of individuals selected Genetics based discovery Sequence based Genotyping the application areas Sequence Based Genetics nalysis with SBG enables direct link to Genomics based Gene analysis Genetic Map QTL Mapping & BS Genetic Distance Hanneke Witsenboer in Case Studies session = Track 2: 19 th March 2015, h Co-dominant Genotypes Discovery & link to WGS Genomic Selection selection results good bad

21 Transcript Sequencing / Profiling

22 RN analysis -Tetraploid cotton Iso-Seq Results root leaf stem # Unigenes 16,287 19,522 17,123 # Unique transcripts 19,610 24,511 20,178 # HQ-isoforms 38,332 46,795 43,380 2,240 root 4,240 3,896 stem Number of unigenes 7,239 2,428 3,608 6,107 leaf

23 RN analysis - Tetraploid cotton Iso-Seq Results: haplotypes splice variants Haplotype 1 from D genome: leaf i3hq c26441/f3p16/3915 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1480 leaf i3hq c26643/f27p17/3916 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1479 leaf i3hq c26815/f1p15/3869 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1475 leaf i4hq c359/f20p22/3914 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1481 root i3hq c173/f43p28/3909 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1479 root i3hq c11578/f2p28/3871 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1479 root i4hq c177/f23p28/3915 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1477 stem i3hq c12093/f14p5/3909 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1480 stem i4hq c403/f9p5/3914 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTC GGCTT 1485 Splice variant stem i3hq c16467/f2p2/3903 TTCGTTGGGTGTCGGCTGTTCGTGCGTTTCGGTCCTTGCGGCTT 1495 Haplotype 2 from genome: leaf i3hq c16772/f13p17/3897 TTCGTTCGGTTTCGCTGTTCGTGCGTTTC GGCGT 1467 root i4hq c294/f6p21/3874 TTCGTTCGGTGTCGCTGTTCGTGCGTTTC GGCGT 1466 root i3hq c292/f24p24/3900 TTCGTTCGGTTTCGCTGTTCGTGCGTTTC GGCGT 1470 stem i3hq c14254/f8p7/3908 TTCGTTCGGTTTCGCTGTTCGTGCGTTTC GGCGT 1468 stem i3hq c19311/f2p7/3864 TTCGTTCGGTTTCGCTGTTCGTGCGTTTC GGCGT 1468 Splicing variant leaf i3hq c17242/f1p7/3935 TTCGTTCGGTTTCGCTGTTCGTGCGTTTCGGTCCTTGCGGCGT 1492 leaf i3hq c26671/f6p11/3912 TTCGTTCGGTTTCGCTGTTCGTGCGTTTCGGTCCTTGCGGCGT 1483 leaf i3hq c26759/f1p7/3893 TTCGTTCGGTTTCGCTGTTCGTGCGTTTCGGTCCTTGCGGCGT 1482 root i3hq c8763/f4p10/3905 TTCGTTCGGTTTCGCTGTTCGTGCGTTTCGGTCCTTGCGGCGT 1478 stem i4hq c11340/f3p5/3927 TTCGTTCGGTTTCGCTGTTCGTGCGTTTCGGTCCTTGCGGCGT 1488

24 Tetraploid cotton Iso-Seq Results: exon skipping Searching NCBI Protein database leaf i3hq c1706/f2p3/3027 matched gb KHG Neurofilament heavy polypeptide [Gossypium arboreum] with 92% Identity leaf i3hq c18699/f1p3/3379 matched to gb KHG Neurofilament heavy polypeptide [Gossypium arboreum] with 87% Identity

25 Lead Discovery with KeySeeQ Leveraging an RN based discovery approach 1 Material choice and Experimantal design 2 cdn synthesis Fragmentation/ with adapters 300 bp size selection adaptor ligation Poly mrn TTTTT TTTTT TTTTT TTTTT Illumina HiSeq Sequencing PacBio cdn synthesis Size partitioning & SMRTbell RS II Sequencing with adapters PCR amplification ligation TTTTT Poly mrn TTTTT TTTTT TTTTT TTTTT TTTTT TTTTT TTTTT Wet lab protocols and procedures 3 Sequence analysis pipeline Propriatary KeyGene development

26 4 Isoform clusters Nonredundant transcript isoforms cdn synthesis Lead Discovery KeySeeQ with adapters TTTT Leveraging KeyGene s strength in computational transcriptomics Final isoforms Consensus calling Quality filtering Evidenced-based gene models Map to reference genome Evidence-based gene models TTTT TTTT TTTT TTTT Size partitioning & PCR amplification TTTT Refine and annotate TTTT network modules TTTT SMRTbell ligation Clean sequence reads Isoform clusters Nonredundant transcript isoforms Remove adapters Remove artifacts Reads clustering Consensus calling Quality filtering Figure 1 RS sequencing Final isoforms Comparative analysis with known Map genes to reference genome Evidence-based gene models comparative genome mapping and analysis (no reference genome) Evidence based correlation Network construction Quality filtering Figure 1 Candidate gene list

27 Make allelic series of target genes with KeyPoint Mutation Breeding.

28 KeyPoint Mutation Breeding Expedited allele development EMS treated seed detection of novel variants C IX X XV d i n B young traeted plants D verification of novel variant plants High throughput randomly induced variation

29 KeyGene s successful track record with KeyPoint Mutation Breeding

30 Integrate and exploit genomic knowledge with With <Crop>Pedia

31 <Crop>Pedia Track record Tomato Pepper Melon Cucumber Lettuce Eggplant Radish Carrot Brassica B. oleracea B. rapa B. napus Oilpalm Cotton Rice Corn Tobacco Potato rabidopsis <Crop>Pedia 12 companies 19 different crops other organisms, ~100 mol. breeders Pathogens / other Nematode Xanthomonas Verticillium Yeast

32 thank you! The Sequence Based Genotyping, WGP, KeyPoint, SNPSelect, Targeted Gap Filling and Directed Genomic Selection Technologies are covered by patents and patent applications owned by KeyGene N.V. Pacific Biosciences, PacBio, SMRT, Iso-Seq are (registered) trademarks of Pacific Biosciences, Inc. ll other products names, brand names or company names are used for identification purposes only, and may be (registered) trademarks of their respective owners. KeyGene, WGP, KeySeeQ and PhenoFab are registered trademarks in one or more territories in the world.