The Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience

Similar documents
Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group

A mutation in TaGW2-A increases thousand grain weight in wheat. James Simmonds

Matthew Tinning Australian Genome Research Facility. July 2012

The international effort to sequence the 17Gb wheat genome: Yes, Wheat can!

Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome

Deep Sequencing technologies

Experimental Design Microbial Sequencing

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Introduction to Microbial Sequencing

Get to Know Your DNA. Every Single Fragment.

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

Single Cell Genomics

Contact us for more information and a quotation

How much sequencing do I need? Emily Crisovan Genomics Core

Next-generation sequencing technologies

Selecting TILLING mutants

A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries.

Next-Generation Sequencing. Technologies

How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

Plant Breeding and Agri Genomics. Team Genotypic 24 November 2012

Illumina s Suite of Targeted Resequencing Solutions

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Next Generation Sequencing. Tobias Österlund

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Sequence-based assembly of chromosome 7A and comparison to diploid progenitors

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

ACCEL-NGS 2S DNA LIBRARY KITS

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

De Novo Assembly of High-throughput Short Read Sequences

Introduction to the MiSeq

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Considerations for Illumina library preparation. Henriette O Geen June 20, 2014 UCD Genome Center

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Welcome to the NGS webinar series

Molecular Biology: DNA sequencing

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Odyssey of the IWGSC Reference Genome Sequence: 12 years 1 month 28 days 11 hours 10 minutes and 14 seconds.

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

APPLICATION NOTE

Genome Sequence Assembly

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

Next-generation sequencing and quality control: An introduction 2016

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

Single Cell Transcriptomics scrnaseq

Massive Analysis of cdna Ends for simultaneous Genotyping and Transcription Profiling in High Throughput

Lecture 7. Next-generation sequencing technologies

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Mate-pair library data improves genome assembly

The genome of Fraxinus excelsior (European Ash)

Bioinformatics Advice on Experimental Design

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

DNA METHYLATION NH 2 H 3. Solutions using bisulfite conversion and immunocapture Ideal for NGS, Sanger sequencing, Pyrosequencing, and qpcr

Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing

How it All Works. Sample. Data analysis. Library Prepara>on. Sequencing

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

High-yield, Scalable Library Preparation with the NEBNext Ultra II FS DNA Library Prep Kit

Design a super panel for comprehensive genetic testing

Nextera DNA Sample Prep Kit (Roche FLX-compatible)

Wet-lab Considerations for Illumina data analysis

NGS developments in tomato genome sequencing

RNA-Sequencing analysis

Course Overview: Mutation Detection Using Massively Parallel Sequencing

Development of quantitative targeted RNA-seq methodology for use in differential gene expression

SureSelect XT HS. Target Enrichment

Genome sequencing in Senecio squalidus

De Novo and Hybrid Assembly

Next Generation Sequencing in Genetic Diagnostics Alan Pittman, PhD

Analysis of Differential Gene Expression in Cattle Using mrna-seq

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

Single Cell Genomics

Using New ThiNGS on Small Things. Shane Byrne

Nextera DNA Sample Prep Kit

EpiGnome Methyl-Seq - DNA Methylation Analysis using Whole Genome Bisulfite Sequencing

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!

A Genomics (R)evolution: Harnessing the Power of Single Cells

Read Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Unique, dual-matched adapters mitigate index hopping between NGS samples. Kristina Giorda, PhD

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Cat. Nos. NT09115, NT091120, NT , NT , and NTBC0950 DISCONTINUED

MHC Region. MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells

Direct determination of diploid genome sequences. Supplemental material: contents

Genome Projects. Part III. Assembly and sequencing of human genomes

Performance Characteristics drmid Dx for Illumina NGS systems

SEQUENCING FROM SAMPLE TO SEQUENCE READY

Genomic resources. for non-model systems

Why are we here? Introduction

Biol 478/595 Intro to Bioinformatics

Transcription:

Building Excellence in Genomics and Computational Bioscience

Wheat genome sequencing: an update from TGAC Sequencing Technology Development now Plant & Microbial Genomics Group Leader Matthew Clark matt.clark@tgac.ac.uk

Project themes 1. Genome sequencing Improve genome 2. Sequence 7 arms Minimal Tiling Paths 3. Undercover new genetic variation TILLING Freely available Easy to browse, search & download etc. The Centre

Chromosome arm assemblies Illumina sequence, Amplified DNA >10M contigs, average N50 2.4kb Jon Wright.

Improving the genome Small fragments Wrong size> bad assembly GC bias > Bad assembly Darren Heavens.

Improving the genome Number Of Fragments In Each Bin Fragment Size (bp) Read Type Machine Read Length Fragment Number Of Read Coverage Size (bp) Bases (Gbps) pair end MiSeq 2 x 250 bp 700 12.9 0.76x pair end HiSeq 2500 2 x 150 bp 700 131.7 7.74x pair end HiSeq 2000 2 x 100 bp 700 335.3 19.73x mate pair MiSeq 2 x 250 bp 6500 12.7 0.75x (9.67x*) Paul Bailey & Bernardo Clavijo. *physical span coverage

Improving the genome 5 read 1 read 2 3 5 5 3 5 Nextclip tool (submitted) to classify Nextera mate pairs Adaptor in both reads (A) 12.64 % Adaptor in read2 only (B) 24.66 % Adaptor in read1 only (C) 25.76 % Adaptor not in any read (D) 18.18 % mate pairs? Relaxed Category A(E) 0.17 % 63% Mate pairs Richard Leggett.

Cat.A, B, C 6.5 kb Cat.D Number Of Fragments In Each Bin RF FR Cat.A, B, C unique reads RF both unique RF one unique RF FR 3 5% of reads map uniquely @ 2*250bp Paul Bailey.

40kb fosmid mate pairs Broad Fosills But 10x coverage needs 100 ligations Mike Bevan JIC The Centre

2. Sequencing 7 Chromosome arms 1. 3DL BACs > LTC MTP> BAC DNAs made. 2. 2DS & 2DL BAC libraries profiled, problems. 3. 2BS, 2BL and 4BL BAC constructed. 4. 4BS remains to be constructed. Matt Clark.

2DS & 2DL contamination contigs Align WGP tags to CSS contigs Get a single location for each BAC Integrate FPC map data Find contigs where all BACs hit the same (non target) arm Exclude these contigs and recalculate map stats Jon Wright.

2DS 600 500 400 1632 698 Number of contigs 300 Size(Mb) 200 100 2DL 0 1200 1000 Before After Estimated 3307 Size (Mb) 800 600 400 1605 200 0 Before After Estimated Jon Wright.

250 200 The origin of contamination 2DS 150 100 50 0 350 300 250 1AL1AS 1BL 1BS 1DL1DS2AL 2AS 2BL 2BS 2DL2DS3AL3AS 3B 3DL3DS4AL 4AS 4BL 4BS 4DL4DS5AL5AS 5BL 5BS 5DL5DS6AL6AS 6BL 6BS 6DL6DS7AL 7AS 7BL 7BS 7DL7DS 2DL 200 150 100 50 0 1AL1AS 1BL 1BS 1DL1DS2AL 2AS 2BL 2BS 2DL2DS3AL3AS 3B 3DL3DS4AL 4AS 4BL 4BS 4DL4DS5AL5AS 5BL 5BS 5DL5DS6AL6AS 6BL 6BS 6DL6DS7AL 7AS 7BL 7BS 7DL7DS Jon Wright.

How to sequence 10,000 BACs? 1. NGS ready BAC DNA High throughput & in microtitre plates 2. Cheap indexed libraries <<< $100/library 3. High throughput Process 10k BACs (chromosome) in weeks 4. Good assemblies Few contigs per BAC, ideally one. The Centre

NGS ready BAC DNA Library plate Copy and grow Deep culture plate Bead based DNA prep Removal gdna 95+% pure BAC DNA The Centre ~100ng BAC DNA

Cheap indexed libraries The Centre

Cheap indexed libraries Tagmentation 95% pure BAC DNA Size selection Add indexed primers & PCR $7 (FEC)/BAC = DNA & library $8 (FEC)/BAC = sequencing @384 plex The Centre

Cheap indexed libraries 0.0003% incorrect

Cheap indexed libraries IPK 454 (x) versus TGAC Illumina (y) TGAC assemblies Velvet N50 = ~5kb Deepali Vasoya.

High throughput A. Scale up 1 plate to many B. Fill an Illumina HiSeq 2000 C. 16 lanes @ 384plex = 6,144 BACs D. DNA and libraries + 11 days to sequence E. 6,144 2H MTP BACs in 4 weeks, from bacteria to sequence The Centre

IPK assembly using CLC The Centre

384 BACs Improved assemblies Nextera mate pair (384 BAC pool) Assign to BACs (map to PE assemblies) Deepali Vasoya.

Improved assemblies using LMPs Improved assemblies & 4kb (+/ 1kb) mate pair CLC assemblies Deepali Vasoya.

3. Undercover new genetic variation 1. Population of (EMS) mutagenised plants; 2. High throughput screen to identify mutations in a gene of interest. 3. Two populations, Kronos (durum) and Cadenza (bread) 4. Exome capture 5. Low-cost sequencing Reverse Genetics Dubcovski (UCD), Philips (R Res) &Uauy (JIC).

Wheat capture for TILLing T. urartu and T. turgidum RNAseq de novo assembled (CLC) + T. aestivum, wheatified barley models and popular wheat genes CDS models aligned to CSS arms (exonerate) filtered by length and %id Exons padded with +/ 30bp genomic sequence Krasileva et al., Genome Biology 2013, 14:R66 Sarah Ayling.

Wheat capture stats Capture contains 82,091 transcripts, in 286,799 exons, totalling 84 Mb 200,584 exons were padded +/ 30bp (70%) 11,344 transcripts were included without a genomic model Alpha design currently being synthesised Sarah Ayling.

Exome capture of 1,536 4x and 6x mutants Submitted >250,000 exons (82+k transcripts) for Nimblegen design Sequencing of first mutants by end of the year By end of 2014: in silico access to TILLING alleles in 4x and 6x wheat >85% probability of a knock-out allele Access to mutants We plan to hold mirror collection of seeds at UC Davis and JIC Germplasm Collection Mutants will be free from any IP for the mutations people find We plan to charge a small fee for ~10 seeds of each mutant to maintain collections

Acknowledgements Computational Genomics Paul Bailey Jon Wright Bernardo Clavijo Dharanya Sampath Sarah Ayling Plant & Microbial Genomics Darren Heavens Deepali Vasoya Library & Sequencing teams Mario Caccamo The Centre

Fellowship Programme in Computational Biology Bioinformatics and Computational Biology. www.tgac.ac.uk/fellowship Competitive salary and research support grant. Fast application process, call open Fellow until posts are filled. Up to five years: for early career scientists who want to become scientific leaders in a dynamic research environment!