Post- sequencing quality evalua2on. or what to do when you get your reads from the sequencer

Similar documents
Parts of a standard FastQC report

Zika infected human samples

Francisco García Quality Control for NGS Raw Data

Quality Control of Sequencing Data

DATA FORMATS AND QUALITY CONTROL

Transcriptome analysis

RNA-Seq Analysis. Simon Andrews, Laura v

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Why QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format:

Next-Generation Sequencing: Quality Control

Experimental Design Microbial Sequencing

Next Genera*on Sequencing II: Personal Genomics. Jim Noonan Department of Gene*cs

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

How it All Works. Sample. Data analysis. Library Prepara>on. Sequencing

De Novo Assembly of High-throughput Short Read Sequences

Introduction to CGE tools

Quality Assessment of Hybrid Selec6on Experiments

From reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Astrocyte GCRB/BICF Workflow for ChIP-Seq Analysis. Venkat Beibei

Introduction of RNA-Seq Analysis

Assembly of Ariolimax dolichophallus using SOAPdenovo2

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

Genomics AGRY Michael Gribskov Hock 331

ChIP-seq analysis 2/28/2018

Differential gene expression analysis using RNA-seq

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

Illumina Sequencing Error Profiles and Quality Control

Quality control for Sequencing Experiments

Introduc)on to GBS. Hueber Yann, Alexis Dereeper, Gau)er Sarah, François Sabot, Vincent Ranwez, Jean- François Dufayard 02/11/2015

Announcements. Coffee! Evalua,on. Dr. Yoshiki Sasai, R.I.P.

Nucleosome Positioning and Organization Advanced Topics in Computa8onal Genomics

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Quality assessment and control of sequence data

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Whole Genome Amplification (WGA): What to Do When You Don t Have Enough Genomic DNA

An overhang-based DNA block shuffling method for creating a customized random library

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Faction 2: Genome Assembly Lab and Preliminary Data

De novo Genome Assembly

Next-generation sequencing and quality control: An introduction 2016

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Gene Expression analysis with RNA-Seq data

BroadE Workshop: Genome Assembly. March 20 th, 2013

Workflow of de novo assembly

Analysis of RNA-seq Data. Bernard Pereira

Quality assessment and control of sequence data. Naiara Rodríguez-Ezpeleta

RNA Polymerase / Promoter Interaction E. in coli HOLOENZYME CORE ENZYME " ## RNA bp 5-6 bp. template strand [-35] [-10]

Best Practice for RNA-seq Analysis

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Quality Control of Next Generation Sequence Data

High throughput DNA Sequencing. An Equal Opportunity University!

2nd (Next) Generation Sequencing 2/2/2018

Differential gene expression analysis using RNA-seq

RNA-Seq de novo assembly training

DNASeq: Analysis pipeline and file formats Sumir Panji, Gerrit Boha and Amel Ghouila

RNA sequencing Integra1ve Genomics module

NextGen Sequencing Technologies Sequencing overview

GENOME ASSEMBLY FINAL PIPELINE AND RESULTS

1. Mitosis = growth, repair, asexual reproduc4on

N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11

Session 8. Differential gene expression analysis using RNAseq data

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Quan=fying genomic varia=on of gut microbiota across the human popula=on. Stephen Nayfach iseem2 Call February 9, 2015

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

A Roadmap to the De-novo Assembly of the Banana Slug Genome

Total RNA isola-on End Repair of double- stranded cdna

Genome Assembly Software for Different Technology Platforms. PacBio Canu Falcon. Illumina Soap Denovo Discovar Platinus MaSuRCA.

Read Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

measuring gene expression December 11, 2018

RNA-seq Data Analysis

Introduction to Microbial Sequencing

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

FFPE in your NGS Study

Genome edi3ng with the CRISPR-Cas9 system

The Basics of Understanding Whole Genome Next Generation Sequence Data

Next- genera*on Sequencing. Lecture 13

From raw reads to variants

RNA-Sequencing analysis

Chapter 8. Quality Control of RNA-Seq Experiments. Xing Li, Asha Nair, Shengqin Wang, and Liguo Wang. Abstract. 1 Introduction

Quantifying gene expression

RNA-SEQUENCING ANALYSIS

NEXT GENERATION SEQUENCING Whole Gene Sequencing

Computational Investigation of Gene Regulatory Elements. Ryan Weddle Computational Biosciences Internship Presentation 12/15/2004

Next Generation Sequencing: An Overview

Sequence Analysis 2RNA-Seq

RNA-Seq Module 2 From QC to differential gene expression.

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

De novo genome assembly with next generation sequencing data!! "

Long and short/small RNA-seq data analysis

Restric(on enzymes IMBB 2013

Figures and Cap2ons. Tables and Figures are NOT Decora2on. Marilee P. Ogren PhD

Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments

Factors affecting PCR

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Yellow-bellied marmot genome. Gabriela Pinho Graduate Student Blumstein & Wayne Labs EEB - UCLA

Transcription:

Post- sequencing quality evalua2on or what to do when you get your reads from the sequencer

The fastq file contains informa2on about sequence and quality Read Iden(fier Sequence Quality

Sources of Library Read Quality Problems Sequencer problems Read quality Library problems GC content Library complexity Adaptor/primer contamina2on Ribosomal RNA

Evalua2ng Quality

FastQC Report (html) basics

Assessing Sequencing Quality

FastQC evaluates sequencer related quality in 3 different ways Per base sequence quality Average quality for each base pair Per 2le sequence quality Average spa2al quality on flow cell Per sequence quality score Average quality per read

FastQC evaluates sequencer related quality in 3 different ways Per base sequence quality Average quality for each base pair Per 2le sequence quality Average spa2al quality on flow cell Per sequence quality score Average quality per read

Per Base Sequence Quality

Mid- read drop in quality can effect mapping efficiency

FastQC evaluates sequencer related quality in 3 different ways Per base sequence quality Average quality for each base pair Per 2le sequence quality Average spa2al quality on flow cell Per sequence quality score Average quality per read

Per 2le sequence quality

Small loss of 2le quality

Large loss of flow cell quality

FastQC evaluates sequencer related quality in 3 different ways Per base sequence quality Average quality for each base pair Per 2le sequence quality Average spa2al quality on flow cell Per sequence quality score Average quality per read

Average quality per read

Drop in quality for a por2on of reads

Assessing Library Quality

Nucleo2de content of the reads Per base sequence content % nucleo2de representa2on at each bp Per sequence GC content Distribu2on of % GC content per read Per base N content % uncalled assigned nucleo2des (N) per posi2on Sequence length distribu2on

Nucleo2de content of the reads Per base sequence content % nucleo2de representa2on at each bp Per sequence GC content Distribu2on of % GC content per read Per base N content % uncalled assigned nucleo2des (N) per posi2on

Per base sequence content

Random hexamer bias

Adaptor/adaptor product

Nucleo2de content of the reads Per base sequence content % nucleo2de representa2on at each bp Per sequence GC content Distribu2on of % GC content per read Per base N content % uncalled assigned nucleo2des (N) per posi2on Sequence length distribu2on

GC content per read

Adaptor/Adaptor product

Biologically overrepresented sequence

Nucleo2de content of the reads Per base sequence content % nucleo2de representa2on at each bp Per sequence GC content Distribu2on of % GC content per read Per base N content % uncalled assigned nucleo2des (N) per posi2on Sequence length distribu2on

Per base N content

Nucleo2de content of the reads Per base sequence content % nucleo2de representa2on at each bp Per sequence GC content Distribu2on of % GC content per read Per base N content % uncalled assigned nucleo2des (N) per posi2on Sequence length distribu2on

Sequence length distribu2on

Length post- trimming

Library Content Sequence duplica2on levels Total sequences vs. De- duplicated sequence Overrepresented sequences Large polymer sequences Adaptor content % adapter per nucleo2de Kmer Content Overrepresented 5- mers

Library Content Sequence duplica2on levels Total sequences vs. Deduplicated sequence Overrepresented sequences Large polymer sequences Adaptor content % adapter per nucleo2de Kmer Content Overrepresented 5- mers

Sequence duplica2on levels

Low complexity Adapter/Adaptor reads

Biological sequence duplica2on

Library Content Sequence duplica2on levels Total sequences vs. Deduplicated sequence Overrepresented sequences Large polymer sequences Adaptor content % adapter per nucleo2de Kmer Content Overrepresented 5- mers

Overrepresented Sequences Ribosomal RNA

Library Content Sequence duplica2on levels Total sequences vs. Deduplicated sequence Overrepresented sequences Large polymer sequences Adaptor content % adapter per nucleo2de Kmer Content Overrepresented 5- mers

Adaptor Content

Too- small library fragments

Library Content Sequence duplica2on levels Total sequences vs. Deduplicated sequence Overrepresented sequences Large polymer sequences Adaptor content % adapter per nucleo2de Kmer Content Overrepresented 5- mers

K- mer (5- mer) content

Don t worry be happy!! Just because your library doesn t look perfect doesn t mean it is BAD. Trim reads Low quality or adapter reads Remove duplicates ONLY IF NECESSARY Mapping takes into account base quality Get more coverage

Trimming reads You can trim reads to remove adapters and low quality sequence We will cover how to do this in another video Here is a preview of how the library can change

Per base sequence content- before trimming

Per base sequence content- aber trimming

Per sequence GC content- Before trimming

Per sequence GC content- aber trimming

Overrepresented sequences Before trimming Aber trimming