Form for publishing your article on BiotechArticles.com this document to

Similar documents
Long and short/small RNA-seq data analysis

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

DATA FORMATS AND QUALITY CONTROL

RNA-Seq analysis workshop

RNA-Seq with the Tuxedo Suite

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Sanger vs Next-Gen Sequencing

Bioinformatics in next generation sequencing projects

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

RNA-Sequencing analysis

Top 5 Lessons Learned From MAQC III/SEQC

RNA-seq Data Analysis

RNAseq Differential Gene Expression Analysis Report

Gene Expression Technology

RNA-Seq Software, Tools, and Workflows

Introduction to Bioinformatics and Gene Expression Technologies

About Strand NGS. Strand Genomics, Inc All rights reserved.

Introduction to Bioinformatics

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Introduction to RNA-Seq

Course Presentation. Ignacio Medina Presentation

Genomics and Transcriptomics of Spirodela polyrhiza

RNA-Seq analysis workshop. Zhangjun Fei

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Next-Generation Sequencing. Technologies

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

measuring gene expression December 5, 2017

Single Cell Genomics

How much sequencing do I need? Emily Crisovan Genomics Core

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

De novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Bioinformatics Advice on Experimental Design

SCALABLE, REPRODUCIBLE RNA-Seq

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

De Novo Assembly of High-throughput Short Read Sequences

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

Serial Analysis of Gene Expression

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Genomics AGRY Michael Gribskov Hock 331

CNV and variant detection for human genome resequencing data - for biomedical researchers (II)

Analysis of Differential Gene Expression in Cattle Using mrna-seq

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Introduction to Bioinformatics and Gene Expression Technology

Introduction to RNA sequencing

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

Read Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

RNA Seq: Methods and Applica6ons. Prat Thiru

Demo of mrna NGS Concluding Report

An introduction to RNA-Seq. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station

3.1.4 DNA Microarray Technology

A step-by-step guide to ChIP-seq data analysis

Introduction to Microarray Analysis

Introduction to genome biology

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Introductory Next Gen Workshop

Measuring transcriptomes with RNA-Seq. BMI/CS 776 Spring 2016 Anthony Gitter

Mapping strategies for sequence reads

SCIENCE CHINA Life Sciences

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Outline. Array platform considerations: Comparison between the technologies available in microarrays

RNASEQ WITHOUT A REFERENCE

Genomics and Bioinformatics GMS6231 (3 credits)

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

Next Generation Sequencing: An Overview

scgem Workflow Experimental Design Single cell DNA methylation primer design

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Supplemental figure 1

Measuring transcriptomes with RNA-Seq

TRANSCRIPTOMICS. (transcriptome) encoded by the genome. time or under a specific set of conditions

Local assembly and pre-mrna splicing analyses by high-throughput sequencing data

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

HLA and Next Generation Sequencing it s all about the Data

Bioinformatics for High Throughput Sequencing

Differential gene expression analysis using RNA-seq

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Calculating Sample Size Estimates for RNA Sequencing Data ABSTRACT

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

IMGM Laboratories GmbH. Sales Manager

Introduction to BIOINFORMATICS

Quality assessment and control of sequence data

B&DA Committee Bioinformatics and Data Analysis. PAG January 2016

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

CRAC: An integrated approach to analyse RNA-seq reads Additional File 4 Results on real RNA-seq data.

Index. E Electrophoretic Mobility Shift Assay (EMSA), 262 ENCODE project, 223, 224 European Nucleotide Archive (ENA), 34

Proteomics and Cancer

Welcome to the NGS webinar series

Introduction to the UCSC genome browser

Illumina TruSeq RNA Access Library Prep Kit Automated on the Biomek FX P Dual-Hybrid Liquid Handler

RNA- seq data analysis tutorial. Andrea Sboner

China National Grid --- BioNode. Jun Wang Beijing Genomics Institute

Introduction to Bioinformatics

RNA SEQUINS LABORATORY PROTOCOL

MICROARRAYS+SEQUENCING

Data and Metadata Models Recommendations Version 1.2 Developed by the IHEC Metadata Standards Workgroup

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Chapter 15 Gene Technologies and Human Applications

Transcription:

Your Article: Article Title (3 to 12 words) Article Summary (In short - What is your article about Just 2 or 3 lines) Category Transcriptomics sequencing and lncrna Sequencing Analysis: Quality Evaluation and Genome Alignment The data analysis of transcriptomics sequencing and lncrna sequencing is the key element of successful interpretation of sequencing. And the quality evaluation and genome alignment are the bases for the data analysis. Bioinformatics Your full article ( between 500 to 5000 words) - - Do check for grammatical errors or spelling mistakes The data analysis of transcriptomics sequencing and lncrna sequencing is the key element of successful interpretation of sequencing. And the quality evaluation and genome alignment are the bases for the data analysis. Basically there are two strategies for lncrna Sequencing: Based on the first strategies,we choose an example of the analysis of transcirptomics sequencing of the liver tissues of a patient with advanced liver cancer. The four tissues include adjacent tissue(n), primary site(p), metastasis(m), and portal vein thrombosis metastasis(v). It stands for the four stages of liver cancer. Firstly we extract all the RNA with PolyA and then do mrna sequencing. We choose the Illumina Hiseq 2000; the size of the inserts in the library is 300bp; we take paired-end sequencing with a read-length of 100 base pairs; based on the D-UTP strandspecific library. The method of transcriptomics analysis Page 1 of 6

The identification of RNA is to figure out the quantity of total RNA, lncrna and message RNA. Different expression quantity of RNA may have different functions, so we need to confirm the specific quantity of all kinds of RNA. The variation is the analysis of changes on structures, expression quantity between RNA. Finally functional annotation is needed for figuring out the function and cellular pathways of RNA. Quality evaluation The data of paired-end sequencing will show like this: N_R1.fastq @HWI-EAS724_0001:8:32:374:374#0/1 GAGCTGTATATGAATAATAGTTCGTTTTTCATTATCCAAGATGGATCGGTATAAAGTCTGCTAAAATAAAGGTACAACG +HWI-EAS724_0001:8:32:374:374#0/1 fcfcfggdfggggfggggcggggggggfgggggcgggfwgggggggggfgcggdgcgcggggfacbbb][bgcgggggd N_R2.fastq @HWI-EAS724_0001:8:32:374:374#0/2 TACCGTTAATAGCAGTAATATCATAATAGTAATAGCATCATAACGGTAGTCCCATAAAAGTGTGTCAGTAGTAGTAGTA +HWI-EAS724_0001:8:32:374:374#0/2 ggggfgggggd_adcggggeggfggeggegf`geececdegggggfegcfegggegggfgac[aced`bd \_c[[yb We will have four pairs of data like this. The first line and the third line stand for the name of the read. The second line is the specific information of bases. The last line is the quality of every base expressed by characters. Every single character has a corresponding ASCII code. For Illumina 1.3 and above, subtract 64 from ASCII code and get the quality score of bases. For Sanger, subtract 33. With the quality score we can calculate the error rate of sequencing. The formula is indicated below: Page 2 of 6

Q sanger = -10 log 10 P Qsolexa-prior to v1.3= -10 log 10 P 1 P With this formula we can calculate the error rate, P, which can directly indicate the quality of sequencing. However, with the help of some tools we can also get the result with a quick and convenient way. I highly recommend fastqc. You can easily see the result in a graph like this: The Q1 is the quality distribution of bases represented by different colors. The red stands for a hundredth and even more error rate. If twenty percent and above of your sequences are in this part, you may get a low-quality sequencing result. The Q2 is the sequence content of every base. Generally four lines with different colors are very close to each other, if part of your bases are not well-distributed, you should consider the reason or cut this part to continue. Genomic alignment We choose Tophat here for genomic alignment. This tool is powerful for mapping and discovering splicing junction with RNA-Seq. With a dual alignment strategy, the genome mapped with Tophat and coding genes mapped with Bowtie, we can know the fitness of read map. We choose a dual strategy here because sometimes the high alignment does not mean a good result. Page 3 of 6

As shown in Q3, the high genomic alignment with a low rate of transcriptomic alignment often means a DNA pollution(since we need an alignment of RNA). If you got a low quality of both, you should consider that every step of your experiment such as PolyA extraction. Action command Quality evaluation fastqc o QC_outdir_N N_R1.fastq N_R2.fastq fastqc o QC_outdir_P P_R1.fastq P_R2.fastq fastqc o QC_outdir_M M_R1.fastq M_R2.fastq fastqc o QC_outdir_V V_R1.fastq V_R2.fastq Genomic alignment(hg19 is the index file of genome in Bowtie2) tophat o tophat_outdir_n --library-type fr-firststrand --fusion-search N_R2.fastq tophat o tophat outdir _P --library-type fr-firststrand --fusion-search P_R2.fastq tophat o tophat_outdir_m --library-type fr-firststrand --fusion-search M_R2.fastq hg19 N_R1.fastq hg19 P_R1.fastq hg19 M_R1.fastq tophat o tophat_outdir_v --library-type fr-firststrand --fusion-search hg19 V_R1.fastq V_R2.fastq Transcriptomic alignment Page 4 of 6

bowtie o bwt_outdir_n refgene -1 N_R1.fastq -2 N_R2.fastq -S N.sam bowtie o bwt_outdir _P refgene -1 P_R1.fastq -2 P_R2.fastq -S P.sam bowtie o bwt_outdir_m refgene -1 M_R1.fastq -2 M_R2.fastq -S M.sam bowtie o bwt_outdir_v refgene -1 V_R1.fastq -2 V_R2.fastq -V P.sam References (if any) 1. 2. About Author: Your Full Name (published) A few lines about you: (published) Sherry Green Sherry Green, from CD Genomics, www.cd-genomics.com, a biotechnology company which provides sequencing, genotyping, microarray service for global researchers. Page 5 of 6

Terms - Do not remove or change this section ( It should be emailed back to us as is) Page 6 of 6