From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Similar documents
Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

Introduction to RNA-Seq in GeneSpring NGS Software

About Strand NGS. Strand Genomics, Inc All rights reserved.

Research Powered by Agilent s GeneSpring

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Agilent Genomic Workbench 7.0

Using VarSeq to Improve Variant Analysis Research

Briefly, this exercise can be summarised by the follow flowchart:

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

FROM DISCOVERY TO INSIGHT

Introduction to human genomics and genome informatics

Training materials.

Hands-On Four Investigating Inherited Diseases

Overview of the next two hours...

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Investigating Inherited Diseases

Training materials.

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

user s guide Question 1

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Novel Variant Discovery Tutorial

Knowledge-Guided Analysis with KnowEnG Lab

MassHunter Profinder: Batch Processing Software for High Quality Feature Extraction of Mass Spectrometry Data

Prioritization: from vcf to finding the causative gene

PeCan Data Portal. rnal/v48/n1/full/ng.3466.html

The University of California, Santa Cruz (UCSC) Genome Browser

Agilent Genomics Software Future Directions

Agilent GeneSpring GX Software

Guided tour to Ensembl

Browsing Genes and Genomes with Ensembl

Alissa Interpret The next evolution of Cartagenia Bench

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

RNA-Seq Analysis. August Strand Genomics, Inc All rights reserved.

Introduction to NGS analyses

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Variant calling in NGS experiments

Using the Genome Browser: A Practical Guide. Travis Saari

Shannon pipeline plug-in: For human mrna splicing mutations CLC bio Genomics Workbench plug-in CLC bio Genomics Server plug-in Features and Benefits

Bioinformatics for Proteomics. Ann Loraine

Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing:

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University

Genetics and Bioinformatics

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

Analysis of Microarray Data

IPA Advanced Training Course

Smart India Hackathon

Finding Genes, Building Search Strategies and Visiting a Gene Page

Finding Genes, Building Search Strategies and Visiting a Gene Page

Single-Cell Whole Transcriptome Profiling With the SOLiD. System

Go to Bottom Left click WashU Epigenome Browser. Click

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

Array-Ready Oligo Set for the Rat Genome Version 3.0

QIAseq Targeted Panel Analysis Plugin USER MANUAL

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

Proteogenomics. Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Axiom mydesign Custom Array design guide for human genotyping applications

SureSelect Clinical Research Exome V2. Optimized for Rare Diseases

Ingenuity Pathway Analysis (IPA )

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University

Homework 4. Due in class, Wednesday, November 10, 2004

Get to Know Your DNA. Every Single Fragment.

HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07.

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.

Measuring and Understanding Gene Expression

March Product Release Information. About IPA. IPA Spring Release (2016): Release Notes. Table of Contents

Supplementary Figures

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Bioinformatics for Cell Biologists

Microarray Informatics

Introduction to the UCSC genome browser

Microarray Informatics

Supplementary Figures and Data

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

An Automated Pipeline for NGS Testing and Reporting in a Commercial Molecular Pathology Lab: The Genoptix Case

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

SMRT Analysis Barcoding Overview (v6.0.0)

Chapter 2: Access to Information

Herramientas para el diseño y el análisis de datos de paneles de genes

Data Retrieval from GenBank

Deep Sequencing technologies

Evidence of Purifying Selection in Humans. John Long Mentor: Angela Yen (Kellis Lab)

Cancer Genetics Solutions

Processing Ion AmpliSeq Data using NextGENe Software v2.3.0

Surely Better Target Enrichment from Sample to Sequencer and Analysis

Agilent GeneSpring/MPP Metadata Analysis Framework

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Transcription:

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with respect to data mining and data storage as large numbers of genetic variants are reported from a single sequencing project. The scientific community has access to a plethora of tools for analyzing this data. Combining these tools to obtain biologically meaningful results is still a challenging task. While primary and secondary analysis can be automated, tertiary data exploration is largely done manually by a researcher (Figure 1). One of the outcomes of the tertiary analysis is a list of mutations identified from the secondary analysis. This information is usually stored in a Variant Call Format (VCF). The VCF has become an important template in modern biology since it is widely used to report variants. Typically, VCF files are flexible and are used to store all variant types including single nucleotide variants, insertions and deletions, copy number variants, and structural variants. Filter and sort variants Annotate and compare regions Translate regions to genes Identify genic regions Primary analysis Production of sequence data and reads Secondary analysis Alignment QC Variant calling on aligned data Tertiary analysis Annotation and filtering of variants Genome browser-driven exploration Biological contextualization Agilent GeneSpring GX Gene ontology analysis Pathway analysis Figure 2. The variant analysis workflow in Agilent GeneSpring GX allows users to import a list of SNPs for tertiary data analysis. Figure 1. NGS analysis can broadly be categorized into three different parts. Primary and secondary analysis is computationally extensive, and is usually automated. Tertiary analysis is the exploration of biologically relevant data. GeneSpring GX now includes a variant analysis workflow that allows users to sort and compare VCF files, identify genes affected by a variation, and perform pathway analysis on affected genes. The workflow includes the steps in Figure 2.

Key Functionalities and Benefits Supports processed NGS data with Variant call information in VCF format Enable simultaneous filtering of variants based on the variant associated information from the VCF file GeneSpring GX supports public and commercial databases including ClinVar, COSMIC, dbnsfp, and 1,000 Genomes. This information can be used for visualization and further analysis Powerful visualization options including elastic genome browser for interactive query of specific variant Perform multi-omic and inter genomic analysis using various tools including pathway analysis and correlation analysis. Importing and Viewing VCF Data This workflow supports VCF files that are exported from tools and portals such as 1000 Genomes (http://www.1000genomes.org/home), Agilent SureCall and Strand NGS. The workflow supports comparing VCF files to identify unique or common variants and can be viewed in the genome browser. Variant Analysis workflow in GeneSpring GX allows user to perform tertiary analysis by translating the effect of SNPs on biological pathways and overlay data in a multi-omics experiment. The user can determine the effect of variants (SNPs, insertions, deletions, Copy Number Variations or structural variants) on genes, transcripts, as well as regulatory regions. VCF files imported in GeneSpring GX are stored within the tool for analysis. Each VCF file is stored as a Region List in the tool upon data import. These can be individually viewed in Genome Browser or a spreadsheet with its corresponding annotations. The drag and drop feature of the tool allows viewing of results as well as annotations. Figure 3 shows the default view in a SNP analysis workflow. Analyses can be easily performed to identify all variants common between VCFs, those that are unique to a given VCF, as well as variants that are commonly detected in all samples. Mutations are color-coded based on subtypes for easy visualization. Data derived from the VCF analysis can be visualized as separate or merged tracks. Read coverage is plotted on the Y-axis. Annotation files (for example TargetScan; CpG Islands) help in understanding the effect of mutation on transcripts. Spreadsheet view of the VCF file, which can be sorted and copied to the clipboard. Figure 3. Agilent GeneSpring GX main view, showing the genome browser with its data and annotation tracks. Any track can be selected to display data as a spreadsheet. 2

Variant filtering The Region List Operations workflow offers the ability to filter variants and the associated data. These options are used to include or exclude certain sites from any analysis being performed by the program. For example, users can remove poor quality variants and common polymorphisms, and categorize SNPs into smaller lists that can be saved as region lists in the experiment navigator. The tool can also be used, for example, to exclude genotypes from any analysis being performed by the program. GeneSpring GX also allows users to cluster a list of filtered regions. Filtered regions can be exported as a text, Browser Extensible Data (BED), or reference file. Genomic information is increasingly used in prognosis and research that requires the need to visualize and analyze thousands of individuals and millions of variants. The variants analysis workflow in GeneSpring GX allows users to cluster variants on their zygosity score, allelic frequency, or any other value or tags that the VCF may have across various samples or VCF files. Figure 4 is an example of a hierarchical tree created to group regions on the column value derived from the VCF file. A Color range -126-63 0 63 126 B Region color by variant type Deletion Insertion Figure 4. A) Hierarchical tree showing 39,912 clustered regions; B) a zoomed-in view. Columns are labeled using the default VCF file columns on the left, and the labels on the top show the variant types. The figure legend shows the color code used for the labels. The color range is determined by the column used to cluster the regions. 3

Adding and Updating Publicly Available Annotations Public annotation databases are available for download from Annotations Manager, as shown in Figure 5. VCF and BED files that list filtered and ranked variants can be saved as part of the Annotations Manager for a specific model organism. Data can be downloaded either from the Agilent server or the local desktop. This information can then be used to compare lists of mutations with annotated mutations derived from public sources (Figure 6), and viewed in the Genome Browser. Annotate Region List can be used to append additional information from another Region List in the experiment or annotation databases such as DNase clusters, GENCODE genes, and so forth. The Import Region List utility allows the user to import region based annotations that can be curated to obtain filtered regions for downstream processing. Figure 5. Annotations Manager can store multiple builds for a given organism. Annotations for more than 30 different model organisms are available on the Agilent server for download, and custom annotations can be added for a specific build of a model organism. Figure 6. Agilent GeneSpring GX allows comparison of a source region list with a region list of choice in two different ways: either to find overlap or specify the maximum distance X (in bp) between two regions to be considered close to each other to compare regions in the variant analysis workflow. 4

Upstream Intronic Exonic Downstream 37.. 29.. 22.. 14.. 74.. chrx chr8 chr9 chr7 chr6 chr5 chr4 chr3 chr22 chr21 chr2 chr20 chr19 chr17 chr18 0 chr16 To identify genes and transcripts in a genomic region, GeneSpring GX takes a set of genome coordinates and retrieves a list of genes using Translate Regions To Genes. A desired flanking region can be set in the workflow. The result of this analysis is a list of genes that are near the selected Region List, within a certain distance (5,000 bp by default). For each gene, Find Genic Parts enables identification of exonic, intronic, upstream, and downstream regions based on user selected transcript model (RefSeq, Ensemble, or UCSC). chr15 Multi-Omic Analysis chr14 chr13 Pathway Analysis chr12 pathways in a single omic as well as multi-omic analysis (Figure 8). A detailed discussion of the multi omic analysis in the GeneSpring suite has been discussed elsewhere1. Users can query the list of genes against several pathway databases such as KEGG, BioCyc, and WikiPathways to identify statistically significant pathways that might be impacted by the variants identified in the study2. To explore the underlying mechanism by which various DNA variants affect a biological process, GeneSpring GX offers an overlay of translated genes on chr11 Gene Ontology (GO) chr1 chr10 For biological interpretation and contextualization of results, GeneSpring GX provides the following options: The translated gene list can then be an input to Gene Ontology analysis for identification of gene s molecular function, biological processes, or cellular localization. Counts Results Interpretation Figure 7. Histogram plot showing a translated gene list of regions with a specific variant. The colors represent the genic part that contains a specific variant such as an insertion, deletion, and so forth. Enriched genes with mutations from 1,000 genomes VCF data Differentially expressed genes from transcriptome experiment Enriched genes from both experiments Figure 8. MAP kinase pathway found to be significantly affected by mutations. 5

Conclusion Agilent GeneSpring GX software is a powerful exploratory tool for the identification, filtering, and curation of variants affecting a biological function. It offers high-resolution interactive browsing of reference genomes as well as different types of genomic annotations derived from a variety of public databases across complex datasets. The intuitive and easy to-use pathway analysis utility allows merging variant data with proteomics and metabolomics in a multi omic setting, as well as inter genomic analysis. References 1. Molecular Subtypes in Glioblastoma Multiforme: Integrated Analysis Using Agilent GeneSpring and Mass Profiler Professional Multi-Omics Software, Agilent Technologies, publication number 5991 5505EN. 2. Correlation Analysis in Agilent GeneSpring and Mass Profiler Professional, Agilent Technologies, publication number 5991-5165EN. www.agilent.com/chem For Research Use Only. Not for use in diagnostic procedures. This information is subject to change without notice. Agilent Technologies, Inc., 2017 Published in the USA, September 25, 2017 5991-8301EN