Herramientas para el diseño y el análisis de datos de paneles de genes

Herramientas para el diseño y el análisis de datos de paneles de genes Hospital Sant Pau Barcelona, 16 Jun 2016 BIER Francisco García fgarcia@cipf.es Departamento de Genómica Computacional. CIPF

Príncipe Felipe Research Center Goal: biomedical research Basic research in genes, targets, molecular and cellular processes, Nanomedicine and Computational Medicine Translation into clinical practice: personalized medicine, cancer, rare diseases, metabolic and functional impairment http://www.cipf.es/ Introduction Príncipe Felipe Research Center (CIPF)

Who are we? The Computational Genomics Department, in Research Center Prince Felipe Team: multidisciplinary group of 14 researchers and technicians led by Joaquín Dopazo http://bioinfo.cipf.es/ Introduction Genomic Computational Department

Who are we? Introduction Genomic Computational Department

Why are we interested in Computational Genomics? The overall goal of the department: Apply computational methods biotechnological problems to biomedical and Research interests: The development and application of novel bioinformatics methods aimed at discovering new drugs Identification of genes or proteins may be considered therapeutic targets Personalized medicine: tools for discovering and diagnostic Introduction Genomic Computational Department

Why are we interested in Computational Genomics? Introduction Personalized Medicine

Why are we interested in Computational Genomics? + Introduction Personalized Medicine and Mendelian Diseases

Big Data Genomics Transcriptomics Metabolomics Lipidomics Proteomics Introduction Omics sciences Epigenomics

Big Data Biological knowledge Clinical knowledge Gene Ontology KEGG pathways MiRNA, CisRed Biocarta pathways InterPro Motifs HUMSAVAR Gene Expression in tissues Transcription Factor Binding Sites Bioentities from literature ClinVar HGMD COSMIC Introduction Regulatory elements Clinical and biological databases

How do we work? Our department collaborates in different research projects and converts researcher needs into bioinformatics solutions Free software for several reasons: Any customer can try our tools The scientific community can test our software This is the current trend in Computational Genomics Introduction Genomic Computational Department

How do we work? Introduction Genomic Computational Department

How do we work? El CIBER en su Área Temática de Enfermedades Raras (CIBERER) es el centro de referencia en España en investigación sobre enfermedades raras: http://www.ciberer.es/ Objetivo: coordinar y favorecer la investigación básica, clínica y epidemiológica, así como potenciar que la investigación que se desarrolla en los laboratorios llegue al paciente, y dé respuestas científicas a las preguntas nacidas de la interacción entre médicos y enfermos. El CIBERER se compone de un equipo humano de más de 700 profesionales e integra a 62 grupos de investigación. Introduction Genomic Computational Department BIER

How do we work? Curso CIBERER de análisis de datos genómicos, 2830 Sep 2016 en Valencia: http://bioinfo.cipf.es/mda15ciberer International course of Genomic Data Analysis, Mar 2017, Valencia: http://bioinfo.cipf.es/gda16/program/ http://bioinfo.cipf.es/courses Introduction Genomic Data Analysis courses

Web tools to analyze gene panel data BIER CIPF Genomic Computational Department

Outline 1) Introduction to NGS Data Analysis 2) TEAM 3) PanelMaps Web Tools Introduction to NGS data analysis

NGS technologies How do these technologies work? Reference genome Introduction NGS data analysis

Genomics Data Analysis Pipeline Primary Analysis Secondary Introduction 1. Sequence preprocessing Fastq 2. Mapping BAM 3. Variant calling VCF 4. Variant annotation VCF 5. Variant prioritization Studies of genomic variation

Fastq format We could say it is a fasta with qualities : 1. 2. 3. 4. Header (like the fasta but starting with @ ) Sequence (string of nt) + and sequence ID (optional) Encoded quality of the sequence @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT +!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 Introduction NGS data analysis: files format

BAM/SAM format @PG @SQ ID:HPG-Aligner VN:1.0 SN:20 LN:63025520 HWI-ST700660_138:2:2105:7292:79900#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA GIJGJLGGFLILGGIEIFEKEDELIGLJIHJFIKKFELFIKLFFGLGHKKGJLFIIGKFFEFFEFGKCKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:2208:6911:12246#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA HHJFHLGFFLILEGIKIEEMGEDLIGLHIHJFIKKFELFIKLEFGKGHEKHJLFHIGKFFDFFEFGKDKFHHCCCF AS:i:254 HWI-ST700660_138:2:1201:2973:62218#2@0/1 0 20 76655 254 76M * 0 0 AACCCCAAAAATGTTGGAAGAATAATGTAGGACATTGCAGAAGACGATGTTTAGATACTGAAAGGGACATACTTCT FEFFGHHHGGHFKCCJKFHIGIFFIFLDEJKGJGGFKIHLFIJGIEGFLDEDFLFGEIIMHHIKL$BBGFFJIEHE AS:i:254 NH:i:1 NM:i:0 NH:i:1 NM:i:1 HWI-ST700660_138:2:1203:21395:164917#2@0/1 256 20 68253 254 4M1D72M * 0 0 NCACCCATGATAGACCAGTAAAGGTGACCACTTAAATTCCTTGCTGTGCAGTGTTCTGTATTCCTCAGGACACAGA #4@ADEHFJFFEJDHJGKEFIHGHBGFHHFIICEIIFFKKIFHEGJEHHGLELEGKJMFGGGLEIKHLFGKIKHDG AS:i:254 NH:i:3 NM:i:1 HWI-ST700660_138:2:1105:16101:50526#6@0/1 16 20 126103 246 53M4D23M * 0 0 AAGAAGTGCAAACCTGAAGAGATGCATGTAAAGAATGGTTGGGCAATGTGCGGCAAAGGGACTGCTGTGTTCCAGC FEHIGGHIGIGJI6FCFHJIFFLJJCJGJHGFKKKKGIJKHFFKIFFFKHFLKHGKJLJGKILLEFFLIHJIEIIB AS:i:368 NH:i:1 NM:i:4 SAM Specification: http://samtools.sourceforge.net/sam1.pdf Introduction NGS data analysis: files format

VCF format http://www.1000genomes.org/ Introduction NGS data analysis: files format

BED format https://genome.ucsc.edu/faq/faqformat.html#format1 Introduction NGS data analysis: files format

Genomics Data Analysis Pipeline Primary Analysis Secondary Introduction 1. Sequence preprocessing Fastq 2. Mapping BAM 3. Variant calling VCF 4. Variant annotation VCF PanelMaps BiERapp 5. Variant prioritization TEAM Studies of genomic variation CSVS

Outline 1) Introduction to NGS Data Analysis 2) TEAM 3) PanelMaps Web Tools Targeted Enrichment Analysis and Management

Can I interpret sequencing data for diagnostic? http://team.babelomics.org/beta/ BIER TEAM Targeted Enrichment Analysis and Management

Introduction Sequencing data Biological knowledge TEAM ClinVar HGMD HUMSAVAR COSMIC Diagnostic TEAM Targeted Enrichment Analysis and Management

How does TEAM work? 1. VCF files 2. Gene panel TEAM ClinVar HUMSAVAR TEAM HGMD COSMIC Targeted Enrichment Analysis and Management

Getting information SIFT SIFT predicts whether an amino acid substitution affects protein function Interpretation: 1 (tolerated) to 0 (not tolerated) http://sift.jcvi.org/ PolyPhen Polymorphism Phenotyping is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein Interpretation: 1 (probably damage) to 0 (benign) http://genetics.bwh.harvard.edu/pph2/index.shtml TEAM Targeted Enrichment Analysis and Management

Getting information Consequence type or effect http://www.ensembl.org/info/genome/variation/predicted_data.html TEAM Targeted Enrichment Analysis and Management

How does TEAM work? http://team.babelomics.org/beta/ 1. Defining panel 2. Uploading input data 3. Getting results TEAM Targeted Enrichment Analysis and Management

How to define a panel? 1. Name of panel 3. Adding: 2. Diseases - more genes - mutations 4. Save panel TEAM Targeted Enrichment Analysis and Management

How to define a panel? Adding new mutations Checking mutations from Genome Viewer TEAM Targeted Enrichment Analysis and Management

Web results 1. OVERVIEW TEAM 2. DIAGNOSTIC Targeted Enrichment Analysis and Management

Web results 3. SECONDARY FINDINGS TEAM Targeted Enrichment Analysis and Management

Reporting results 4. REPORT PDF TEAM Targeted Enrichment Analysis and Management

Remarks TEAM is a free web tool Easy-to-use and powerful TEAM helps you for diagnostic TEAM Targeted Enrichment Analysis and Management

More information TEAM Tutorial: BIER TEAM http://ciberer.es/bier/team Targeted Enrichment Analysis and Management

Outline 1) Introduction to NGS Data Analysis 2) TEAM 3) PanelMaps Web Tools Introduction to NGS data analysis

Can I visualize and detect deletions for gene panel? BIER PanelMaps A web tool to analyze gene panel data

How does PanelMaps work? 1. BAM files 2. BED file PanelMaps PanelMaps Detecting altered regions for targeted sequencing

How does PanelMaps work? PanelMaps Coverage description

How does PanelMaps work? http://panelmaps.juanes.xyz/dashboard/pmapsdemo PanelMaps Detection of altered regions

How does PanelMaps work? http://panelmaps.juanes.xyz/dashboard/pmapsdemo PanelMaps Detecting altered regions for targeted sequencing

Any comment or question? Web tools for gene panel data