Sanger vs Next-Gen Sequencing

Similar documents
Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Analytics Behind Genomic Testing

Next Generation Sequencing

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Differential gene expression analysis using RNA-seq

Bioinformatics in next generation sequencing projects

Quantifying gene expression

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

Introduction to NGS analyses

DATA FORMATS AND QUALITY CONTROL

RNA-Seq Module 2 From QC to differential gene expression.

Deep Sequencing technologies

Introduction to RNA-Seq in GeneSpring NGS Software

Welcome to the NGS webinar series

Gene Expression analysis with RNA-Seq data

About Strand NGS. Strand Genomics, Inc All rights reserved.

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Long and short/small RNA-seq data analysis

NEXT GENERATION SEQUENCING. Farhat Habib

Galaxy Platform For NGS Data Analyses

Introduction to RNA-Seq

RNA Sequencing: Experimental Planning and Data Analysis. Nadia Atallah September 12, 2018

Introduction to RNA-Seq

Course Presentation. Ignacio Medina Presentation

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

1. Introduction Gene regulation Genomics and genome analyses

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

Wheat CAP Gene Expression with RNA-Seq

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

Introduction to bioinformatics (NGS data analysis)

RNA-seq Data Analysis

Illumina Sequencing Error Profiles and Quality Control

Applications of short-read

NGS in Pathology Webinar

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Transcriptome analysis

Differential gene expression analysis using RNA-seq

Sequence Analysis 2RNA-Seq

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Introduction to RNA sequencing

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

RNA-Seq with the Tuxedo Suite

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

G E N OM I C S S E RV I C ES

Introduction of RNA-Seq Analysis

Lecture 7. Next-generation sequencing technologies

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE

RNAseq Differential Gene Expression Analysis Report

RNA-Seq Software, Tools, and Workflows

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

Next-Generation Sequencing. Technologies

NGS Data Analysis and Galaxy

Next Gen Sequencing. Expansion of sequencing technology. Contents

RNA Seq: Methods and Applica6ons. Prat Thiru

SCALABLE, REPRODUCIBLE RNA-Seq

Reference genomes and common file formats

Introduction to human genomics and genome informatics

Reference genomes and common file formats

Introduction to RNA-Sequencing Analysis for Differential Expression on Galaxy

Francisco García Quality Control for NGS Raw Data

Differential gene expression analysis using RNA-seq

Eucalyptus gene assembly

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

QIAseq Targeted Panel Analysis Plugin USER MANUAL

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

High performance sequencing and gene expression quantification

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

Introduc)on to Bioinforma)cs of next- genera)on sequencing. Sequence acquisi)on and processing; genome mapping and alignment manipula)on

Genomic Data Analysis Services Available for PL-Grid Users

Bioinformatics for NGS projects. Guidelines. genomescan.nl

RNA-Seq analysis workshop

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Arraygen Technologies Pvt. Ltd.

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Introduc)on to Genomics


CNV and variant detection for human genome resequencing data - for biomedical researchers (II)

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction

02 Agenda Item 03 Agenda Item

2nd (Next) Generation Sequencing 2/2/2018

Next-Generation Sequencing Services à la carte

Genomic DNA ASSEMBLY BY REMAPPING. Course overview

NGS, Cancer and Bioinformatics. 5/3/2015 Yannick Boursin

Total RNA isola-on End Repair of double- stranded cdna

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

NGS part 2: applications. Tobias Österlund

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

Differential gene expression analysis using RNA-seq

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

NGS sequence preprocessing. José Carbonell Caballero

ChIP-seq analysis 2/28/2018

Transcription:

Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics and Systems Biology Core University of Nebraska Medical Center Fall, 2017 GCBA/MGCB/BMI 815 Sanger vs Next-Gen Sequencing Source: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahukewj356gajzwahxezfqkhzrlch0qjrwibw&url=http%3a%2f%2fslideplayer.com%2fslide%2f11674461%2f&psig=aovvaw3bhydsg4jhy9z4y3jc11iy&u st=1507933065294289 1

Next-Gen Sequencing No in vivo cloning Source: https://bloggenohub.files.wordpress.com/2015/01/slide1.jpg Cost of Human Genome Sequencing Source: http://blog.dnanexus.com/wp-content/uploads/2017/04/screen-shot-2017-04-24-at-11.40.38-am.png 2

Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 Applications of NGS Genome Whole genome sequencing Whole exome sequencing Targeted gene panels (cancer, newborns, autism, etc.) Transcriptome Whole RNA sequencing mrna transcriptome (poly-a selection) Small RNA analysis (sirna, snorna, lincrna, etc.) Gene expression profiling for selected target genes Metagenome Bulk sequencing of many types of bacteria Examples: human gut microbiome, soil samples, food contamination, extremophiles, etc. Epigenome Chromatin Immunoprecipitation Sequencing (ChIP-Seq) Methylation Sequencing (Methyl-Seq) 3

Different Sequencing Libraries Source: http://slideplayer.com/7847747/25/images/7/types+of+sequencing+libraries.jpg Paired-end Sequencing Source: https://assets.illumina.com/content/dam/illumina-marketing/images/science/v2/web-graphic/paired-end-vs-singleread-seq-web-graphic.jpg 4

FASTQ Files from Paired-end Sequencing Source: https://bioinf-galaxian.erasmusmc.nl/galaxy/ Demultiplexing Mixed Samples Source: https://www.illumina.com/content/dam/illumina-marketing/images/technology/multiplexing-overview-figure.gif 5

Different File Types in NGS analysis Fastq file generated by the sequencer, contains NGS reads SAM file Sequence Alignment/Map (generated by aligning the NGS reads with the reference genome) BAM file Binary version of the SAM file (SAMtools are used to manipulate SAM/BAM files) GFF file General Feature Format used to hold genome annotation (chromosome, strand, frame, exon, CDS, etc.) GTF file Gene Transfer Format (Also contains all the info as in GFF and in addition contains gene annotation information) VCF file Variant Call Format (used to store variant data such as SNPs, InDels, short structural rearrangements) Fall, 2017 GCBA/MGCB/BMI 815 Fastq @SRR098401.11403008/1 GAGGCTATAGCATGGTCAAGGCACAAGAAGATCACTGGACTGCCCTCGCTCAGCCCTCAGCTACTG + >>?>?@>?>@@>?@@=@@@@@??>??@??@?@A?>@@@?>@@???A@:@A@@A@@@A@@AAB@@BB Row 1: Information from the sequencer about the location of this read on the plate Row 2: The Sequence Row 3: Metadata provided by the sequencing team Row 4: Quality scores pertaining to each nucleotide in the sequence 6

FASTQ format: FASTQ is based on the popular FASTA format for sequences FASTA format >sequence_id; header in one line AGTTGTAGTCCGTGATAGTCGGATCGG FASTQ format provides additional information that includes the quality score @20FUKAAXX100202:1:64:10634:114560/1 TTGTATTTTTAGTAGAGACGGAGTTTCGCCATGTTGGTCAGGCTGGCCTCGAATTCCTGACCTCAAGTGATCCGCCCGCCTCGGCCTCCCAACGTTTTGG +?=@7=>B==;;BB?<B?=8539<6?6>8>=BB<<B=08:9@5;:A@@?@9:BAAA<?;8;@AC@BBBBBA?<9-@B@;CAA77<:BEB<BB@07?@=<?84 ASCII code for Quality score (Phred score, ranges from 0-50) ASCII code for Quality score (in the increasing order;! is the worst and ~ is the best Fall, 2017 GCBA/MGCB/BMI 815 Sequence Alignment / Map (SAM / BAM) SRR098401.104031357 83 chr22 17445857 60 76M = 17445512-421 ACTGTTACCAGATCAAGAACTGATAGGGACAGGGATCATTATTCCCCCTTTACAGATGAGAAGGCCGTCACGCCTC @@>>B@@@BBAAAB9A@@>:@@?=A@?@?@A???>?@??=???@@@@@>@>>@@@><??@>@>@@8?>?=:@>?>> BD:Z:NOJKPQQQQMONOMKKKLNOMNLLLJLMINLJLMLMLKKKKJLJJJMKCKLINJMMLJKKKMOOMNNOLPQSNMK K PG:Z:MarkDuplicates RG:Z:NA12878 BI:Z:OOMLRRPPRPPQQONOLOPOONOOOKLNMONJKMNONMMMMLMKKKMLGMNLNMMNNJMJLNOMLNMPNONONNM M NM:i:0 MQ:i:60 AS:i:76 XS:i:0 Similar to the Fastq file in that it contains the raw sequence and its quality scores. It also tells you where the sequence aligned to the genome, and how well (this scre is also phred-scaled). In this case, this read aligned to chromosome 22, position 17445857, and has a quality score of 60 (or a 1 in 1,000,000 chance of being placed incorrectly). 7

Variant Call Format (VCF) RNA-Seq Data Analysis 8

Computational Analysis of RNA-Seq Data Source: Conesa et al., Genome Biology, 2016, 17:13 RNA-Seq Data Analysis Workflow Illumina, Ion Torrent, PacBio FastQC, FQTrim STAR, HISAT, TopHat, Sailfish, Salmon Cufflinks, EdgeR, DESeq CuffDiff, DESeq, DegeR, Limma GSEA, IPA, DAVID, GO, etc. 9

Input Files for RNA-seq Analysis Download Test Data file from the Course Page and unzip the folder Galaxy Server https://usegalaxy.org/ A large compilation of open-source NGS data analysis tools that are accessible to users on web-based platforms Data can be uploaded from a PC/Mac and computing can be done on the cloud No need to install tools and maintain servers locally In-depth tutorials are available to use Galaxy services A list of Public Galaxy Servers can be found at https://galaxyproject.org/public-galaxy-servers/ Today s RNA-seq analysis will be performed from the following link https://bioinf-galaxian.erasmusmc.nl/galaxy/ 10

Phred Score (Q) explained Phred&score&(Q)&vs&Error&probability&(P)& Q = 10 log10 P & Base Sequence Quality Interpretation Bad Quality Excellent Quality Quality drops at the tail end Bad Quality 11

Read Mapping and Assembly Source: https://home.cc.umanitoba.ca/~frist/plnt7690/lec12/lec12.3.html Downstream Analysis of RNA-seq Results Hierarchical Clustering IPA: Ingenuity Pathway Analysis GSEA- Gene Set Enrichment Analysis Source: Yoo et al., Nature Genetics, 2014 Source: Li et al, Scientific Reports, 2015 Source: Graner et al, Front. Oncology, 2015 Source: Bee et al., PLoS ONE, 2011 12