Bioinformatics Advice on Experimental Design

Similar documents
How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

How much sequencing do I need? Emily Crisovan Genomics Core

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Next-generation sequencing technologies

Welcome to the NGS webinar series

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

Single Cell Transcriptomics scrnaseq

Next-Generation Sequencing. Technologies

Genomic & RNA Profiling Core Facility

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Research school methods seminar Genomics and Transcriptomics

solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome

Deep Sequencing technologies

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Genomic Resources and Gene/QTL Discovery in Livestock

SEQUENCING. M Ataei, PhD. Feb 2016

Transcriptomics analysis with RNA seq: an overview Frederik Coppens


The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

About Strand NGS. Strand Genomics, Inc All rights reserved.

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Cancer Genetics Solutions

Introduction to BIOINFORMATICS

DNA METHYLATION RESEARCH TOOLS

Plant Breeding and Agri Genomics. Team Genotypic 24 November 2012

G E N OM I C S S E RV I C ES

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

RNA-Seq data analysis course September 7-9, 2015

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Experimental design of RNA-Seq Data

RNA-SEQUENCING ANALYSIS

RNA-Sequencing analysis

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

02 Agenda Item 03 Agenda Item

NGS-based innovations within the Leiden Network

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

Introduction to Bioinformatics and Gene Expression Technology

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

Next Generation Sequencing. Tobias Österlund

ACCEL-NGS 2S DNA LIBRARY KITS

Matthew Tinning Australian Genome Research Facility. July 2012

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Massive Analysis of cdna Ends for simultaneous Genotyping and Transcription Profiling in High Throughput

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Introduction to Bioinformatics

Overcome limitations with RNA-Seq

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

The Agilent Technologies SureSelect Platform for Target Enrichment

SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

High throughput sequencing technologies

Introduction to Microbial Sequencing

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Single Cell Genomics

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

GENOMICS WORKFLOW SOLUTIONS THAT GO WHERE THE SCIENCE LEADS. Genomics Solutions Portfolio

Wet-lab Considerations for Illumina data analysis

Overview of Next Generation Sequencing technologies. Céline Keime

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

Contact us for more information and a quotation

Next-Generation Sequencing Services à la carte

The MiniSeq System. Explore the possibilities. Discover demonstrated NGS workflows for molecular biology applications.

Next-generation sequencing technologies

Ultrasequencing: methods and applications of the new generation sequencing platforms

Genomic solutions for complex disease

NOW GENERATION SEQUENCING. Monday, December 5, 11

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Surely Better Target Enrichment from Sample to Sequencer

Next Generation Sequencing Technologies. Some slides are modified from Robi Mitra s lecture notes

NGS Approaches to Epigenomics

Course Overview: Mutation Detection Using Massively Parallel Sequencing

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Introduction to RNA-Seq in GeneSpring NGS Software

Modern Epigenomics. Histone Code

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

Next Generation Sequencing: An Overview

14 March, 2016: Introduction to Genomics

Supplementary Information

Illumina Technology Updates

Assay Validation Services

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

Long and short/small RNA-seq data analysis

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

From reads to results: differential. Alicia Oshlack Head of Bioinformatics

Applications of Next Generation Sequencing in Metagenomics Studies

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Ion S5 and Ion S5 XL Systems

Axiom Biobank Genotyping Solution

Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato

Services Presentation Genomics Experts

Ion S5 and Ion S5 XL Systems

Transcription:

Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics cannot rescue a bad experimental design. Please contact our Bioinformatics team for a consultation when in doubt. Next Generation (NGS) experiments Many steps in the experimental process can introduce various biases and errors, and careful consideration must be given to the following aspects: Platform choice: Platform Company Read length Samples per run Reads per run Run time Website Platform Hiseq 2000 SOLiD 4 system HeliScope Illumina Applied Biosystems 2x100bp 8 ~300million 8 days 2x100-150bp 16 ~800 million 8 days 50 +25bp 16 >700 million 11-13 days Helicos Biosciences ~30bp 50 ~500 million 8 days www.illumina.com www.illumina.com www.appliedbiosystems.com www.helicosbio.com Genome Sequencer FLX Titanium System Roche Genome Analyzer IIx Illumina 400-600bp 16 ~1 million 10 h www.454.com These numbers change rapidly as technology improves. Please note that these numbers are based on data from Oct. 2010. Please refer to the websites listed under each platform for the latest numbers. Type of Run Paired End (PE) or Single End (SE): The following table provides a guide to what type to run is recommended for typical applications of various NGS assays. Center for Research Informatics Bioinformatics Core last updated May 2015

Paired End RNASeq - De novo Assembly RNASeq - Splicing ChIP Seq Epigenetic modifications DNA SNP Identification DNA Indel identification DNA Structural variants Single End RNASeq - Counting ChIP-Seq - Counting Read Length: 50bp reads are typically sufficient for read mapping to the reference genome, and RNASeq counting experiments. >100bp reads are useful for whole genome and transcriptome studies based on the application. Replication: Samples must be sequenced with replicates to identify sources of variance and increase statistical power to separate true biological variance from technical variance. Biological replicates are critical whereas technical replicates are typically not required. Cutting back replicates to reduce cost might seem like a good option, but remember: A sample or sequencing run can fail, and lead to repeating the experiment. In general, 4 biological replicates per experiment are recommended, however, 3 replicates if also reasonable. Please consult with us with further questions. You can also use http://bioinformatics.bc.edu/marthlab/scotty/help.html for calculation of power from your pilot data. Randomization: Assign individuals at random to different groups to reduce bias. We recommend randomization of samples such that each sequencing lane contains samples from all experimental groups. Please refer to Blocking and Multiplexing below to understand how to do this. Blocking & Multiplexing: Distribute samples across various lanes on the flowcell to avoid lane effects. Use multiplexing effectively for balanced block designs. (Fig.1) But all samples cannot be sequenced on each lane as the number of unique barcodes for each lane also limits us. Solution: Balanced incomplete block design. Block what you can and randomize what you cannot. Box, Hunter, & Hunter (1978)

Group! Biological replicates! A! B! 1! 2! 3! 1! 2! 3! RNA extraction! R1! R2! R3! R1! R2! R3! Flowcell! Flowcell! L1! L2! L3! L4! L5! L6! Lane1 Lane2 Lane3 Lane4 Lane5 Lane6! L1! L1! L1! L1! L1! L1! L2! L2! L2! L2! L2! L2! L3! L3! L3! L3! L3! L3! L4! L4! L4! L4! L4! L4! L5! L5! L5! L5! L5! L5! L6! L6! L6! L6! L6! L6! Lane1 Lane2 Lane3 Lane4 Lane5 Lane6! If, I= Number of groups/treatments J= Number of biological replicates per treatment s= Number of unique barcodes that can be added in one lane L= Number of lanes sequenced T=Total number of technical replicates T = sl JI If s<i, complete block design is not possible. [1] depth: The following table provides general recommendations for coverage/reads (https://genohub.com/recommended-sequencing-coverage-by-application/) for typical read lengths for the HUMAN genome. Please visit https://genohub.com/nextgeneration-sequencing-guide/#reads for typical number of reads/lane for various commonly used NGS platforms. A useful resource from Illumina for specific coverage estimates for various Illumina instruments and genomes of different sizes is http://support.illumina.com/downloads/sequencing_coverage_calculator.html

DNA: Category Application Recommended coverage (X) or reads (in millions) Whole genome Re-sequencing 30-80X De novo assembly 100X SNP detection 10-30X Indel detection 60X Genotype calls 35X Whole exome sequencing DNA Target- Based DNA Methylation CNVs 1-8X SNVs detection 100x (3-13x local depth) Indel detection NOT recommended De novo assembly >100M ChIP-Seq 10-40X [10-14M (sharp peaks); 20-40M (broad marks)] Hi-C 100M CAP-Seq >20M RRBS (Reduced Representation Bisulfite ) Bisulfite-Seq 10X 5-15X; 30X RNA (for human/mouse genome): Please note that the number of reads your need for any type of RNASeq also depends on the desired dynamic range of expression. Category Application Recommended of mapped reads (in millions) Transcriptome Differential expression 10-25M (RNA-Seq) Alternative splicing 50-100M Allele specific expression 50-100M RNA-Target- CLIP-Seq 10-40M Based PAR-CLIP 5-15M RIP-Seq 5-20M Small RNA Differential Expression ~1-2M (microrna) Discovery ~5-8M

Microarray Experiments A very useful resource for microarray design is: http://discover.nci.nih.gov/microarrayanalysis/experimental.design.jsp Balanced samples o Same amount of cases and controls o Matched phenotypes: gender, age, etc. Biological replicates o Pure background to avoid biological variation o More replicates are needed if there is larger variation between individuals and small difference between groups Avoid technical variation o Process sample at same condition as much as possible o Technician, reagents, time, procedures Randomize samples on array o Avoid confounding technical and biological factors o Randomly put samples on different array slides and positions

Frequently Asked Questions What if I do not have replicates of data points? Understand the limitations of un-replicated data! You cannot separate technical variance from biological variance, thus, the results only apply to the data points sequenced but cannot be extrapolated to the population. What is difference between Biological replicates and technical replicates? Technical replicates: measure quantity from 1 source. This measures the reproducibility of the results. The differences are based only on technical issues in the measurement. (I weigh myself three times, do I get different weights? How different?) Biological replicates: measure a quantity from difference sources under the same conditions. Tumors from 5 different people with lung cancer may show similar gene expression patterns. These replicates are useful to show what is similar in your replicates and how they are different from a different set of conditions (ie. treated, normal). Biological variation is intrinsic to all organisms; it may be influenced by genetic or environmental factors, as well as by whether the samples are pooled or individual. Technical variation is introduced during the extraction, labeling and hybridization of samples. Measurement is associated with reading the fluorescent signals, which may be affected by factors such as dust on the array. References 1. P. L. Auer and R. W. Doerge. 2010. Statistical design and analysis of RNA sequencing data. Genetics 185:405-416.