How much sequencing do I need? Emily Crisovan Genomics Core

Similar documents
How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

Bioinformatics Advice on Experimental Design

Single Cell Transcriptomics scrnaseq


Matthew Tinning Australian Genome Research Facility. July 2012

Applications of Next Generation Sequencing in Metagenomics Studies

Next-generation sequencing technologies

Single Cell Genomics

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Genomic & RNA Profiling Core Facility

Introduction to Microbial Sequencing

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

Deep Sequencing technologies

Next-Generation Sequencing Services à la carte

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Long and short/small RNA-seq data analysis

NGS part 2: applications. Tobias Österlund

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

FRAUNHOFER INSTITUTE FOR INTERFACIAL ENGINEERING AND BIOTECHNOLOGY IGB NEXT-GENERATION SEQUENCING. From wet lab to dry lab complete sample analysis

Plant Breeding and Agri Genomics. Team Genotypic 24 November 2012

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Genomics and Transcriptomics of Spirodela polyrhiza

Next-generation sequencing technologies

Understanding the science and technology of whole genome sequencing

Research school methods seminar Genomics and Transcriptomics

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

RNA-Sequencing analysis

Introduction to the MiSeq

Single Cell Genomics

Next Generation Sequencing. Tobias Österlund

Announcements. Coffee! Evalua,on. Dr. Yoshiki Sasai, R.I.P.

Genome sequencing in Senecio squalidus

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

Contact us for more information and a quotation

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

Experimental Design Microbial Sequencing

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

* Custom assays developed based on customer requirements. * Rigorous QC process to ensure test performance and accuracy

Next- gen sequencing. STAMPS 2015 Hilary G. Morrison Joe Vineis, Nora Downey, Be>e Hecox- Lea, Kim Finnegan

Standard Products Nucleic Acid Sample Submission Guideline

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

02 Agenda Item 03 Agenda Item

RNA sequencing with the MinION at Genoscope

Next Generation Sequencing: An Overview

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

Illumina s Suite of Targeted Resequencing Solutions

A Genomics (R)evolution: Harnessing the Power of Single Cells

Wet-lab Considerations for Illumina data analysis

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Quan9fying with sequencing. Week 14, Lecture 27. RNA-Seq concepts. RNA-Seq goals 12/1/ Unknown transcriptome. 2. Known transcriptome

Applications of short-read

Throughput cells cells. Methodology Full transcript or end-counting end-counting. Chemistry SMARTer V SMARTer V. Run time hours.

Using New ThiNGS on Small Things. Shane Byrne

Product selection guide Ion S5 and Ion S5 XL Systems

Supplementary Information

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

G E N OM I C S S E RV I C ES

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

NextGen Sequencing and Target Enrichment

Genomic resources. for non-model systems

Analysing genomes and transcriptomes using Illumina sequencing

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

SCGC Service Description For further information, go to scgc.bigelow.org Updated June 1, 2018

Development of quantitative targeted RNA-seq methodology for use in differential gene expression

RNASEQ WITHOUT A REFERENCE

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Overcome limitations with RNA-Seq

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Lecture 7. Next-generation sequencing technologies

Next-generation sequencing Technology Overview

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

High performance sequencing and gene expression quantification

Parts of a standard FastQC report

Product selection guide Ion GeneStudio S5 Series

Next-Generation Sequencing. Technologies

Quality Control of Next Generation Sequence Data

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

Next-Generation Sequencing: custom solutions

RNA-SEQUENCING ANALYSIS

About Strand NGS. Strand Genomics, Inc All rights reserved.

IMGM Laboratories GmbH. Sales Manager

Admera Health RUO Services

Single-Cell Genomics Deep insights into complex biology

NEXT GENERATION SEQUENCING. Farhat Habib

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

The Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Introduction Bioo Scientific

GENOMICS WORKFLOW SOLUTIONS THAT GO WHERE THE SCIENCE LEADS. Genomics Solutions Portfolio

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

14 March, 2016: Introduction to Genomics

Transcription:

How much sequencing do I need? Emily Crisovan Genomics Core

How much sequencing? Three questions: 1. How much sequence is required for good experimental design? 2. What type of sequencing run is best? 3. How many lanes of sequencing? **All based on Illumina sequencing options**

Experimental Design What are you sequencing? Genome De novo assembly Resequencing project Transcriptome De novo assembly Gene expression project Amplicon sequencing Whole meta-genomes Small RNAs ChIP-seq Exome capture

What type of sequencing run? Single end (SE) or paired end (PE)? What read length? 35 bp, 50 bp, 75 bp, 150 bp, 250 bp, 300 bp Not all read lengths are available on all machines Assembly of genome or transcriptome? PE 150 bp, 250 bp, 300 bp Counting experiment? SE 35 bp, 50 bp, 75 bp

How many lanes of sequencing? Genome assembly Depends on the desired coverage New assembly: 75x 100x Resequencing: 10x 20x Long-read error correction: 20x 30x Transcriptome assembly Number of genes in the genome Complexity of the transcriptome # lanes required = desired Gbp / expected Gbp per lane

How many lanes of sequencing? For gene expression analysis Counting experiment What is typical in you field? Consider ploidy How many replicates? Account for variability between samples # lanes required = (minimum # reads per sample x # replicates x # samples x fudge factor) / # of reads per lane

The fudge factor There will always be variation in the number of reads per sample per lane Need to account for this when designing experiment It is hard to assign a specific value to the fudge factor Call/e-mail us to discuss the fudge factor

Genome Sequencing Example #1 New eukaryotic genome assembly 1.2 Gbp genome Target 80x coverage PE 150 HiSeq 4000 averages 350 million reads per lane How many lanes of sequencing do you need? # lanes required = desired Gbp / expected Gbp per lane What changes if this was a resequencing project?

Genome Sequencing Example #1 Calculations Calculate expected Gbp per lane of HiSeq 4000 PE150: (# of reads x read length) / 1,000,000,000 (350,000,000 x 300) / 1,000,000,000 = 105 Gbp Calculate desired Gbp: 1.2 Gbp x 80 = 96 Gbp Calculate # of lanes required: 96 Gbp / 105 Gbp = 0.91 lanes à round up to 1 lane because we do not sell partial lanes What changes if this was a resequencing project? The desired coverage would be reduced to 10-20x

Genome Sequencing Example #2 New prokaryotic genome 12 different bacterial isolates 8 Mbp genome Target 40x coverage PE 150 HiSeq 4000 averages 350 million reads per lane How many lanes of sequencing do you need? # lanes required = desired Gbp / expected Gbp per lane

Genome Sequencing Example #2 Calculations Calculate expected Gbp per lane of HiSeq 4000 PE150: (# of reads x read length) / 1,000,000,000 (350,000,000 x 300) / 1,000,000,000 = 105 Gbp Calculate desired Gbp: 8 Mbp x 40x coverage x 12 isolates = 3.84 Gbp Calculate # of lanes required: 3.84 Gbp / 105 Gbp = 0.04 lanes The HiSeq 4000 is not the appropriate sequencer for this project.

Genome Sequencing Example #2 New prokaryotic genome 12 different bacterial isolates 8 Mbp genome Target 40x coverage PE 150 HiSeq 4000 averages 350 million reads per lane MiSeq v2 Standard PE 250 gives ~6.0-7.5 Gbp per run How many lanes of sequencing do you need? # lanes required = desired Gbp / expected Gbp per lane

Genome Sequencing Example #3 Whole genome metagenomics sequencing Unknown number of fungal, bacterial & other species Unknown genome sizes PE 150 HiSeq 4000 averages 350 million reads per lane How many lanes of sequencing do you need? # lanes required = desired Gbp / expected Gbp per lane Perform experiment to determine what is present and then go forward from there

Transcript Sequencing Example Transcript assembly 25,000 genes Target of 60 million reads per sample PE 150 HiSeq 4000 averages 350 million reads/lane How many different mrna samples can be prepared and loaded on one lane?

Transcript Sequencing Example Calculations # of reads per lane / # of reads per sample = # of samples that can be loaded on one lane 350 M reads / 60 M reads per sample = 5.8 samples à round down to 5 Calculate the actual average to see if there is enough wiggle room: 350 M reads / 5 samples = 70 M reads per sample Yes, this is enough wiggle room.

Gene Expression Example Gather counts for differential expression analysis Mammals: 30 50 million reads per sample Plants: 25 million reads per sample Replicates: 3 5 # of samples is experiment-dependent SE 50 HiSeq 4000 averages 350 million reads/lane # lanes required = (minimum # reads per sample x # replicates x # samples x fudge factor) / # reads per lane

Gene Expression Example Method 1 How many lanes of sequencing are needed if you have 6 samples with 3 replicates each and you would like a minimum of 30 million reads each? # lanes required = (minimum # reads per sample x # samples x # replicates x fudge factor) / # reads per lane 30 M reads x 6 samples x 3 replicates x 1.25 fudge factor = 675 M reads 675 M reads / 350 M reads per lane = 1.9 lanes è 2 lanes 350 M reads per lane * 2 lanes = 700 M reads total 700 M reads total / (6 samples * 3 replicates) = 38.8 M reads per sample Is this enough wiggle room?

Gene Expression Example Method 2 How many lanes of sequencing are needed if you have 6 samples with 3 replicates each and you would like a minimum of 30 million reads each? 30 M reads x 6 samples x 3 replicates = 540 M reads 540 M reads / 350 M reads per lane = 1.5 lanes è 2 lanes 350 M reads per lane * 2 lanes = 700 M reads total 700 M reads total / (6 samples * 3 replicates) = 38.8 M reads per sample Is this enough wiggle room?

Amplicon Sequencing Example Sequencing the same target from multiple samples Metagenomic survey Specific target from many individuals (ex. 16S V4) Barcoding required PE 250 MiSeq standard run 8-10 million read pairs expected Coverage dependent on number of samples Variation between samples is very large # runs required = # read pairs desired * # samples * fudge factor / read pairs per run

Amplicon Sequencing Example You have 96 samples and you would like 70,000 read pairs per sample 9,000,000 read pairs per run / 96 samples = ~93,750 read pairs per sample With amplicon sequencing you will receive a wide range of reads per sample, for instance, 30,000 150,000 read pairs per sample. Will this be suitable for your experiment? If not, reduce the number of samples per run.

Small RNA Sequencing Example Goal is to gather counts for differential expression analysis For mirnas, 10 million reads are common 3 to 5 replicates SE 50 # lanes required = (minimum # reads per sample x # replicates x # samples x fudge factor) / # reads per lane Same calculations as a gene expression study