Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com
Introduction to RNA-Seq In a few years, massively parallel cdna sequencing, or RNA-seq, has allowed many advances in the characterization and quantification of transcriptomes. Rapidly decreasing sequencing cost and massively-parallel sequencing technologies has resulted in a dramatic increase in the quantity of data that needs to be analyzed. Therefore, the need of the day is to build a tool that will enable the analysis and integration of data produced on multiple platforms and using multiple methods. Agilent has designed NGS analysis in GeneSpring keeping in mind the biologist, who is interested in answering REAL biological questions and does not want to become a bioinformatics expert just to do their work.
Agilent SureSelect Target Enrichment 5 Map Reads to the Ref. Genome 6 Quality Control on Mapping 7 Detect SNPs/InDels or Diff. Spliced Genes 8 Find Biological Relevance for your Results GeneSpring NGS
GeneSpring NGS Provides Downstream Analysis of Next-Gen Sequencing data Primary Analysis Secondary Analysis Tertiary Analysis Data File (Reads + Quality) Reads aligned to genome FASTQ, BAM, Control Software ELAND/BIOS COPE/BWA GeneSpring NGS FASTQ Data File (Reads + Quality) Reads aligned to genome
Questions we can seek to answer using GeneSpring RNA-Seq Workflow are What are the differentially expressed genes? What are the differentially spliced genes? Are there any SNPs in the transcriptome? Can we identify gene fusion events? Can we identify novel genes, novel exons, and novel splice junctions?
Import Data and Annotations 6
Download Human build HG18 from the Agilent Server Open Human Select hg18 and tree Homologene Groups Annotations files can be quite large Click [UPDATE] to start the download 8
Baits and Targets for SureSelect Catalog Kits are Pre-loaded List of Agilent SureSelect Catalog Kits 9
Creating a new Experiment Click New Or select New Experiment Experiment icon to from start new experiment Project menu 10
Create New GeneSpring NGS Experiment Provide some useful and descriptive name Be sure to select NGS as the Analysis type
Importing Data: Choosing Metadata New organisms supported on demand Prepackaged annotations available for a variety of organisms Indicate organism, build and transcript model Support for single end, paired end, and mate pair protocols Support for Illumina, Life Technology, 454 Roche
Provide Information on Sequencing platform and library layout for SureSelect specific kits Be sure to select the previously loaded Target region 13
GeneSpring NGS Organization Of The Windows Project and Experiment Navigator Workflow Browser Region View 14
Quality Control 15
Perform QC on reads and filter anomalous reads View reads by tile/lane and remove reads in anomalous tiles Perfect match reads and too many mismatch reads in this tile. Base qualities in the same tiles show bad
Quality Inspection Open Quality Control Manager Press Compute to calculate the Library QC metrics
QC Manager feature in SureSelect experiments Off-target reads can be removed to focus analysis Targeted Regions (SureSelect Baits) July 11 Page 18
Determine the expression values for each gene
Run Quantification to determine raw gene counts Which reads contribute to a gene s count? Only Reads overlapping exonic regions contribute to the read count These reads do NOT contribute Multiply mapping reads contribute fractionally to the count
Quantification Quantify Genes, Transcripts and Exons Unchecked will only count reads falling completely inside an exon. Reverted back to the All Aligned New Reads genes list to and exons are establish a base line. Feel discovered using free to try a different conservation data (filtered) read list (Conservation track in Annotations)
Filter genes by RPKM: Results of Filtering Profile plot of genes that pass the filter criteria
Handling overlapping genes Which reads contribute to a gene s count? These reads contribute to both genes except for ABI data which is strand specific Gene on positive strand Overlapping gene on negative strand
Determine expression values normalized across samples
Scatter Plot between Two Replicates
Detection of differentially expressed genes
Output of Differential Expression Analysis P-values, Corrected p- values and fold changes Volcano Plot
Determine differentially spliced genes detected
Identifying Differentially Spliced Genes Compute the proportion of a gene s count that can be ascribed to a particular transcript. If the proportion for a particular transcript changes substantially across conditions, the gene is said to be differentially spliced. 34
The Challenge: Deconvoluting Transcript Read Counts Which of the 4 transcripts do these reads come from?
Differential Splicing Analysis View Results in Gene View Ensure Splicing Analysis Results Entiy list is selected Click Gene View icon to show Gene View
Differential Splicing Analysis Gene View Possible New Exon Gene s RPKM This transcript is expressed less in Tumor 4 Transcript RPKMs This transcript is expressed more in Tumor
Determine SNPs be determined in the transcriptome
GeneSpring NGS has a built-in SNP calling algorithm Set Filters for SNP statistical significance Set filters for min number of overlapping reads and min number of overlapping variant reads
GeneSpring NGS calls transcript effects for each SNP and allows filtering of SNPs based on these effects Change in Amino Acid for Nonsynonymous SNPs Types of effects predicted
Viewing SNPs in the Genome Browser Color-coded indicator for a Homozygous SNP Known in dbsnp GeneSpring NGS SNP Call In a Repeat Region
Determine chimeric transcripts or fusion genes
Identify Fusion Genes In a K562 Leukemia cell line, GeneSpring NGS confirms the well-known BCR-ABL1 gene fusion. Several reads pairs for the BCR gene on chr22 with mates translocated to the ABL1 gene on chr9 The corresponding paired reads for the ABL1 gene on chr9 Filters set on the Genome Browser to show only translocated reads
Detection of novel genes, exons, and splice junctions
Identify Novel Exons and Genes In a mouse myoblast study, GeneSpring NGS determines a new exon for the FHL3 gene Read clumps not aligned with a known exon Add exon to gene if close to or within the gene, otherwise call it a new gene Novel exon determined by GeneSpring NGS, probably a new transcription start site
Identify Novel Splice Junctions In a brain tissue expression study, GeneSpring NGS determines a new splice junction in the DTX3 gene when considering only Refseq transcripts; this novel splice junction is corroborated by a UCSC transcript. Solid lines show spliced reads connecting the 1 st and 3 rd exons of the RefSeq transcript Indeed, a known UCSC transcript that is not present in RefSeq validates this discovery The corresponding novel splice junction found by GeneSpring NGS
Biological Contextualization Pathway Analysis 49
Agilent GeneSpring NGS for SureSelect Display the Results on a Pathway
Questions we can seek to answer using GeneSpring RNA-Seq Workflow What are the differentially expressed genes? What are the differentially spliced genes? Are there any SNPs in the transcriptome? Can we identify gene fusion events? Can we identify novel genes, novel exons, and novel splice junctions?
Summary Differential expression and splicing analysis Novel gene, exon and alternative splicing discovery Gene Fusion Analysis SNP & InDel discovery and annotation with dbsnp
Summary in General GeneSpring NGS supports both SureSelect RNASeq experiments as well as RNA Seq experiments that don t use Sure Select The workflow steps in GeneSpring NGS application are application specific and changes based on whether you are analyzing a DNA-SEQ or RNA-SEQ experiment. It is possible to integrate data produced on multiple platforms and using multiple methods in the same project in GeneSpring. Multiple different visualization tools available to query the data.
Thank you dipa@strandsi.com informatics_support@agilent.com
http://www.avadis-ngs.com 55