MedSavant: An open source platform for personal genome interpretation

Similar documents
Novel Variant Discovery Tutorial

Implementing ACMG guidelines on sequence variant interpretation: software-assisted variant curation and filtering

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD

Implementing ACMG guidelines on sequence variant interpretation: software-assisted variant curation and filtering

Q4. WORK STREAMS Discovery, Cloud, Large Scale Genomics. DRIVER PROJECTS ELIXIR Beacon, EVA/EGA/ENA. WORK STREAMS Discovery

Implementing ACMG guidelines on sequence variant interpretation: software-assisted variant curation and filtering

Alissa Interpret The next evolution of Cartagenia Bench

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Variant prioritization in NGS studies: Annotation and Filtering "

SUPPLEMENTARY INFORMATION

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Using VarSeq to Improve Variant Analysis Research

HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07.

An Automated Pipeline for NGS Testing and Reporting in a Commercial Molecular Pathology Lab: The Genoptix Case

Shannon pipeline plug-in: For human mrna splicing mutations CLC bio Genomics Workbench plug-in CLC bio Genomics Server plug-in Features and Benefits

The Matchmaker Exchange A Global Effort to Identify Novel Disease Genes. Ada Hamosh, MD, MPH Johns Hopkins University February 2017

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

An innovative approach to genetic testing for improved patient care

Genomic Data Is Going Google. Ask Bigger Biological Questions

Genomic solutions for complex disease

Clinician s Guide to Actionable Genes and Genome Interpretation

Nuts & Bolts of Informatics Infrastructure for Clinical Next Generation Sequencing (NGS)

Gathering of pathogenicity evidence for novel variants. By Lewis Pang

Prioritization: from vcf to finding the causative gene

Experiences in implementing large-scale biomedical workflows on the cloud: Challenges in transitioning to the clinical domain

Genetics and Bioinformatics

An Automated Pipeline for NGS Testing and Reporting in a Commercial Molecular Pathology Lab: The Genoptix Case

TOTAL CANCER CARE: CREATING PARTNERSHIPS TO ADDRESS PATIENT NEEDS

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

SureSelect Clinical Research Exome V2. Optimized for Rare Diseases

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Complete automation for NGS interpretation and reporting with evidence-based clinical decision support

NextSeq 500 System WGS Solution

The most trusted information for research and clinical genomics

Genomic Singularity Is Near

Infinium Global Screening Array-24 v1.0 A powerful, high-quality, economical array for population-scale genetic studies.

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

Automating the ACMG Guidelines with VSClinical. Gabe Rudy VP of Product & Engineering

Computational Challenges of Medical Genomics

Agilent NGS Solutions : Addressing Today s Challenges

Genomics and personalised medicine

Comments on Use of Databases for Establishing the Clinical Relevance of Human Genetic Variants

Bridging the Genomics-Health IT Gap for Precision Medicine

Genetics and Genomics in Clinical Research

SureSelect Clinical Research Exome V2 Definitive Answers Where it Matters Most

Informed Consent for Columbia Combined Genetic Panel (CCGP) for Adults Please read the following

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

NGS in Pathology Webinar

Complementary Technologies for Precision Genetic Analysis

Introduction to iplant Collaborative Jinyu Yang Bioinformatics and Mathematical Biosciences Lab

Genetics/Genomics in Public Health. Role of Clinical and Biochemical Geneticists in Public Health

Understanding the science and technology of whole genome sequencing

BaseSpace Knowledge Network Variant interpretation is simplified with organized biomarker content curated from large public databases.

Briefly, this exercise can be summarised by the follow flowchart:

Variant calling in NGS experiments

Global Screening Array (GSA)

Biology 644: Bioinformatics

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting

European Genome phenome Archive at the European Bioinformatics Institute. Helen Parkinson Head of Molecular Archives

2017 HTS-CSRS COMMUNITY PUBLIC WORKSHOP

Custom Panels via Clinical Exomes

Interpreting Genome Data for Personalised Medicine. Professor Dame Janet Thornton EMBL-EBI

- OMICS IN PERSONALISED MEDICINE

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Next Generation Sequencing. Target Enrichment

OncoMD User Manual Version 2.6. OncoMD: Cancer Analytics Platform

Meeting report series

PeCan Data Portal. rnal/v48/n1/full/ng.3466.html

Elucidating the Pathogenicity of Rare Missense Variants with Statistically-Validated In vitro Functional Studies

Targeted resequencing

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Information Technology for Genetic and Genomic Based Personalized Medicine. Submitted. April 23, 2008

Scottish Genomes Project: Taking Genomics into Healthcare. Zosia Miedzybrodzka

New Frontiers in Personalized Medicine

100,000 Genomes Project Rare Disease Programme. Dr Richard Scott Clinical Lead for Rare Disease, Genomics England

Transform Clinical Research Into Value-Based Personalized Care

211 Technology Platform Ontario s Most Recent Addition:

Fundamentals of Clinical Genomics

ILLUMINA SEQUENCING SYSTEMS

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

Xerox DocuShare 7.0 Content Management Platform. Enterprise content management for every organization.

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

Applied Bioinformatics

Human Genome. The Saudi. Program. An oasis in the desert of Arab medicine is providing clues to genetic disease. By the Saudi Genome Project Team

Pioneering Clinical Omics

Training materials.

Course Presentation. Ignacio Medina Presentation

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Redefine what s possible with the Axiom Genotyping Solution

High-throughput scale. Desktop simplicity.

Bioinformatics for Proteomics. Ann Loraine

The University of California, Santa Cruz (UCSC) Genome Browser

The Need for Scientific. Data Annotation. Alick K Law, Ph.D., M.B.A. Marketing Manager IBM Life Sciences.

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

NLM Funded Research Projects Involving Text Mining/NLP

Axiom Biobank Genotyping Solution

Introduction to BIOINFORMATICS

Genomic Research: Issues to Consider. IRB Brown Bag August 28, 2014 Sharon Aufox, MS, LGC

Genomics and Healthcare A global and local perspective

Leveraging Informatics to Improve Health Outcomes and Value

Transcription:

MedSavant: An open source platform for personal genome interpretation Marc Fiume 1, James Vlasblom 2, Ron Ammar 3, Orion Buske 1, Eric Smith 1, Andrew Brook 1, Sergiu Dumitriu 2, Christian R. Marshall 2, Kym M. Boycott 4, Peter Ray 2, Gary D. Bader 1,3,5, Michael Brudno 1,2,* 1 Department of Computer Science, University of Toronto, Toronto, ON, Canada 2 The Hospital for Sick Children, Toronto, ON, Canada 3 The Donnelly Centre, Toronto, ON, Canada 4 Children s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, Canada 5 Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada * corresponding author

A key challenge in the broad deployment of personalized genomic medicine is the processing and interpretation of patients genomes in the context of their clinical indications. To address this challenge we developed MedSavant, an open-source, appbased platform facilitating the organization, storage, annotation, search, and visualization of patient genomes. MedSavant is accessible to geneticists of all levels of computational expertise, can be installed locally or in a scalable cloud environment, and is freely available (open source) at http://medsavant.org. The existing paradigm for human genome data analysis typically involves serial processing of flat text files followed by manual inspection of a small variant ser. These workflows require a substantial amount of both genetic and informatics expertise, are non-interactive and timeconsuming, and do not easily scale to meet the needs of clinical and other large-scale sequencing applications. Most freely available tools for the filtration and analysis of genomic variants rely on a command-line interface. For example, GATK 1 is a command-line tool that can be used to filter genetic variants stored in the text file-based variant call format (VCF). Similarly, GEMINI 2 introduced a datastore for genetic variants and an SQL-like language for expressing common queries against it. The few graphical tools do not scale to hundreds of human genomes: VarB 3 and VarSifter 4 are solutions for variant search that run entirely on the desktop, placing limitations on performance and restricting data access to a single computer. A number of related commercial tools are also available, however these are typically closed platforms that are not easily extendable. To address these issues we have developed MedSavant. MedSavant can be installed on local commodity hardware or in cloud environments, and uses a secure protocol (SSL) to ensure privacy of patient data. A centralized datastore organizes patient (e.g. pedigree, disease, phenotype), genetic (e.g. read alignments, genetic variants) and other information (e.g. cohorts, gene panels, and variant comments). Genetic variants in VCF format can be uploaded via a drag-and-drop interface and are automatically annotated with a configurable set of functional, population, and disease information. MedSavant uses a columnar database technology (Infobright, http://www.infobright.org) which exhibits superior data compression compared to flat files (up to 10X) and faster query times (up to 150X) compared to commonly used command-line utilities (see Supplement, Section 2). Significant advancements in retrieval speed enable interactive exploration of large and richly annotated genomic datasets. High-throughput DNA sequencing is used both for research (e.g. disease gene mapping) and in the clinic (e.g. diagnostics, incidental findings, pharmacogenetics). To support a wide range of applications, each workflow in MedSavant is implemented as a separate app, using an open Application Programming Interface (API). Tutorials and example code for app development are available on the MedSavant website. A publicly accessible App Store is available to encourage developers of open-source bioinformatics tools to easily produce and distribute MedSavant apps. MedSavant is shipped with several apps immediately useful for both clinical and research workflows: The Discovery app is a clinically-oriented variant search tool empowering clinicians to quickly identify rare and potentially causal variants in a patient s genome. It automatically filters

variants based on variant quality, harmfulness predictions (e.g. from Polyphen 5 ), mutational effect (from Jannovar 6 ), gene panels, and allele frequencies from population databases (e.g. 1000 Genome Project 7, 6500 exome NHLBI-ESP, http://evs.gs.washington.edu/evs/). Included as a default gene panel is the set of 56 incidental genes described by the American College of Medical Genetics and Genomics (ACMG) 8, facilitating rapid incidental finding detection. Variants are annotated with modes of inheritance from ~2,800 genes, as described in the Clinical Genomic Database 9 and linked to Clinvar 10, to accelerate identification of clinically relevant variants. In Figure 1A we use the Discovery app to identify the causal variant in a patient with retinitis pigmentosa. Using default settings, a total of 520 missense and loss-offunction (LOF) mutations were identified that passed quality control thresholds. Using the Human Phenotype Ontology 11 to identify genes associated with decreased central vision, a single missense mutation was observed in the gene RPE65, which has been previously implicated in the disease 12. For more advanced users, we created the Variant Navigator app, a general-purpose variant search engine. Queries can be constructed graphically based on criteria such as patient features (e.g. age, sex, phenotype), genotype features (e.g. quality values, genomic region, functional effect) and annotations (e.g. Gene Ontology 13, OMIM 14, COSMIC 15 ). The resulting variants can be visualized using chromosomal heatmaps, searchable spreadsheets, or the integrated genome browser 16. Detailed information is provided for selected variants and the genes in which they reside, including functionally similar genes as predicted by the GeneMANIA functional interaction network tool 17. Mendel is a disease gene identification app to guide the resolution of genetic disorders through case-control and pedigree analyses. Given pedigree information, Mendel can perform segregation based on inheritance models. We benchmarked Mendel on 14 rare disorders studied by the FORGE consortium, a Canadian effort aimed at identifying the genes responsible for rare childhood diseases 18. A MedSavant database was created containing 138,640,418 SNVs and small indels from exome sequencing of 424 FORGE subjects. For each of the chosen disorders, Mendel was used to identify genes having damaging variants that segregated in patients affected by the disorder, compared to the remaining (control) individuals. For example, the query shown in Figure 1B revealed missense and splicing mutations in C5orf42 for patients affected by Joubert Syndrome 19, with no other results as shown in Figure 1C. In total, the causal gene was independently discovered for 13/14 (93%) of the selected disorders. For the disorder that could not be resolved using Mendel, two of the causal variants did not pass basic quality filters; they were validated with Sanger sequencing in the original study. A summary of Mendel queries used for each disorder is provided in Section 3 of the Supplement. Growth in the size of sequencing repositories has inspired the development of cloud-based platforms for genome sequence data hosting and access. The MedSavant app for Google Genomics is a proof-of-concept that provides a visual interface for listing a large number of publicly accessible read alignment datasets remotely hosted through Google s API (developers.google.com/genomics) 20 and loading them as tracks that can be interactively explored in MedSavant s genome browser. This app demonstrates the potential to efficiently deliver large data collections to researchers with minimal computational infrastructure by leveraging cloud resources. The need to share genomic data has been recognized through initiatives like the Global Alliance for Genomics & Health (GA4GH; genomicsandhealth.org);

future versions will make use of Google and GA4GH APIs to integrate remotely hosted read and variant datasets into other MedSavant apps. The MedSavant App framework supports realtime cooperation among installed apps. For example, from within the Patient Directory a user can select a patient, further filter the patient s variants using the Variant Navigator, and inspect the results using the Genome Browser or further analyze them with Mendel. The connectivity between apps is exemplified in Figure 2. The process of interpreting genomes within this environment is conceptually different from current approaches, which involve serial processing of data using independent tools. The collaborative system simplifies development and integration of tools, and encourages the growth of an ecosystem that will increase in power as more apps continue being developed, together making the translation of human genome and associated data into research discoveries and clinical applications easier and more efficient.

Figures Figure 1: (A) Single patient analysis using Discovery, identifying a missense mutation causing retinitis pigmentosa. Mutations are filtered to low-frequency harmful, and then intersected for the set of genes previously implicated in decreased vision based on the Human Phenotype Ontology. (B) Mendel query used to identify harmful variants in a cohort of 9 French-Canadians with Joubert syndrome. Starting from a list of exonic, splicing, or UTR mutations with Allele Frequency <.05 and Quality > 30 (entered using the Variant Navigator) the query yields (C) only five variants, all in the C5orf42 gene, suggesting that it is causal.

Figure 2: MedSavant dashboard (top-left) and apps. The Patient Directory (top-right) displays patient, family, and cohort information. Genetic variants for individual or groups of patients can be flexibly searched in the Variant Navigator (bottom-left). Search criteria are configured in the left panel, and results are shown in the spreadsheet to the right. Conditions can be constructed based on patient (e.g. sex), genetic (e.g. chromosome), annotation (e.g. harmfulness), and ontology (e.g. Gene Ontology) information. Variants of interest can be visualized using the Savant Genome Browser (bottom-right).

References 1. McKenna, A. et al. Genome Res. 20, 1297 1303 (2010). 2. Paila, U. et al. PLoS Comput. Biol. 9, e1003153 (2013). 3. Preston, M. D. et al. Bioinformatics 28, 2983 2985 (2012). 4. Teer, J. K., et al. Bioinformatics 28, 599 600 (2011). 5. Adzhubei, I. A. et al. Nat. Methods 7, 248 249 (2010). 6. Jäger, M. et al. Hum. Mutat. 35, 548 555 (2014). 7. Siva, N. Nat. Biotechnol. 26, 256 (2008). 8. Green, R. C. et al. Genet. Med. 15, 565 574 (2013). 9. Solomon, B. D. et al. Proc. Natl. Acad. Sci. U. S. A. 110, 9851 9855 (2013). 10. Landrum, M. J. et al. Nucleic Acids Res. 42, D980 5 (2013). 11. Robinson, P. N. et al. Am. J. Hum. Genet. 83, 610 615 (2008). 12. Roman, A. J. et al. Invest. Ophthalmol. Vis. Sci. 54, 1378 1383 (2013). 13. Ashburner, M. et al. Nat. Genet. 25, 25 29 (2000). 14. Hamosh, A., T. Hum. Mutat. 15, 57-61(2001). 15. Bamford, S. et al. Br. J. Cancer 91, 355 358 (2004). 16. Fiume, M. et al. Nucleic Acids Res. 40, W615 21 (2012). 17. Warde-Farley, D. et al. Nucleic Acids Res. 38, W214 20 (2010). 18. Beaulieu, C. L. et al. Am. J. Hum. Genet. 94, 809 817 (2014). 19. Srour, M. et al. Am. J. Hum. Genet. 90, 693 700 (2012). 20. Gruber, K. Nat. Biotechnol. 32, 508 (2014)