Herramientas para el diseño y el análisis de datos de paneles de genes Hospital Sant Pau Barcelona, 16 Jun 2016 BIER Francisco García fgarcia@cipf.es Departamento de Genómica Computacional. CIPF
Príncipe Felipe Research Center Goal: biomedical research Basic research in genes, targets, molecular and cellular processes, Nanomedicine and Computational Medicine Translation into clinical practice: personalized medicine, cancer, rare diseases, metabolic and functional impairment http://www.cipf.es/ Introduction Príncipe Felipe Research Center (CIPF)
Who are we? The Computational Genomics Department, in Research Center Prince Felipe Team: multidisciplinary group of 14 researchers and technicians led by Joaquín Dopazo http://bioinfo.cipf.es/ Introduction Genomic Computational Department
Who are we? Introduction Genomic Computational Department
Why are we interested in Computational Genomics? The overall goal of the department: Apply computational methods biotechnological problems to biomedical and Research interests: The development and application of novel bioinformatics methods aimed at discovering new drugs Identification of genes or proteins may be considered therapeutic targets Personalized medicine: tools for discovering and diagnostic Introduction Genomic Computational Department
Why are we interested in Computational Genomics? Introduction Personalized Medicine
Why are we interested in Computational Genomics? + Introduction Personalized Medicine and Mendelian Diseases
Big Data Genomics Transcriptomics Metabolomics Lipidomics Proteomics Introduction Omics sciences Epigenomics
Big Data Biological knowledge Clinical knowledge Gene Ontology KEGG pathways MiRNA, CisRed Biocarta pathways InterPro Motifs HUMSAVAR Gene Expression in tissues Transcription Factor Binding Sites Bioentities from literature ClinVar HGMD COSMIC Introduction Regulatory elements Clinical and biological databases
How do we work? Our department collaborates in different research projects and converts researcher needs into bioinformatics solutions Free software for several reasons: Any customer can try our tools The scientific community can test our software This is the current trend in Computational Genomics Introduction Genomic Computational Department
How do we work? Introduction Genomic Computational Department
How do we work? El CIBER en su Área Temática de Enfermedades Raras (CIBERER) es el centro de referencia en España en investigación sobre enfermedades raras: http://www.ciberer.es/ Objetivo: coordinar y favorecer la investigación básica, clínica y epidemiológica, así como potenciar que la investigación que se desarrolla en los laboratorios llegue al paciente, y dé respuestas científicas a las preguntas nacidas de la interacción entre médicos y enfermos. El CIBERER se compone de un equipo humano de más de 700 profesionales e integra a 62 grupos de investigación. Introduction Genomic Computational Department BIER
How do we work? Curso CIBERER de análisis de datos genómicos, 2830 Sep 2016 en Valencia: http://bioinfo.cipf.es/mda15ciberer International course of Genomic Data Analysis, Mar 2017, Valencia: http://bioinfo.cipf.es/gda16/program/ http://bioinfo.cipf.es/courses Introduction Genomic Data Analysis courses
Web tools to analyze gene panel data BIER CIPF Genomic Computational Department
Outline 1) Introduction to NGS Data Analysis 2) TEAM 3) PanelMaps Web Tools Introduction to NGS data analysis
NGS technologies How do these technologies work? Reference genome Introduction NGS data analysis
Genomics Data Analysis Pipeline Primary Analysis Secondary Introduction 1. Sequence preprocessing Fastq 2. Mapping BAM 3. Variant calling VCF 4. Variant annotation VCF 5. Variant prioritization Studies of genomic variation
Fastq format We could say it is a fasta with qualities : 1. 2. 3. 4. Header (like the fasta but starting with @ ) Sequence (string of nt) + and sequence ID (optional) Encoded quality of the sequence @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT +!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 Introduction NGS data analysis: files format
BAM/SAM format @PG @SQ ID:HPG-Aligner VN:1.0 SN:20 LN:63025520 HWI-ST700660_138:2:2105:7292:79900#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA GIJGJLGGFLILGGIEIFEKEDELIGLJIHJFIKKFELFIKLFFGLGHKKGJLFIIGKFFEFFEFGKCKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:2208:6911:12246#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA HHJFHLGFFLILEGIKIEEMGEDLIGLHIHJFIKKFELFIKLEFGKGHEKHJLFHIGKFFDFFEFGKDKFHHCCCF AS:i:254 HWI-ST700660_138:2:1201:2973:62218#2@0/1 0 20 76655 254 76M * 0 0 AACCCCAAAAATGTTGGAAGAATAATGTAGGACATTGCAGAAGACGATGTTTAGATACTGAAAGGGACATACTTCT FEFFGHHHGGHFKCCJKFHIGIFFIFLDEJKGJGGFKIHLFIJGIEGFLDEDFLFGEIIMHHIKL$BBGFFJIEHE AS:i:254 NH:i:1 NM:i:0 NH:i:1 NM:i:1 HWI-ST700660_138:2:1203:21395:164917#2@0/1 256 20 68253 254 4M1D72M * 0 0 NCACCCATGATAGACCAGTAAAGGTGACCACTTAAATTCCTTGCTGTGCAGTGTTCTGTATTCCTCAGGACACAGA #4@ADEHFJFFEJDHJGKEFIHGHBGFHHFIICEIIFFKKIFHEGJEHHGLELEGKJMFGGGLEIKHLFGKIKHDG AS:i:254 NH:i:3 NM:i:1 HWI-ST700660_138:2:1105:16101:50526#6@0/1 16 20 126103 246 53M4D23M * 0 0 AAGAAGTGCAAACCTGAAGAGATGCATGTAAAGAATGGTTGGGCAATGTGCGGCAAAGGGACTGCTGTGTTCCAGC FEHIGGHIGIGJI6FCFHJIFFLJJCJGJHGFKKKKGIJKHFFKIFFFKHFLKHGKJLJGKILLEFFLIHJIEIIB AS:i:368 NH:i:1 NM:i:4 SAM Specification: http://samtools.sourceforge.net/sam1.pdf Introduction NGS data analysis: files format
VCF format http://www.1000genomes.org/ Introduction NGS data analysis: files format
BED format https://genome.ucsc.edu/faq/faqformat.html#format1 Introduction NGS data analysis: files format
Genomics Data Analysis Pipeline Primary Analysis Secondary Introduction 1. Sequence preprocessing Fastq 2. Mapping BAM 3. Variant calling VCF 4. Variant annotation VCF PanelMaps BiERapp 5. Variant prioritization TEAM Studies of genomic variation CSVS
Outline 1) Introduction to NGS Data Analysis 2) TEAM 3) PanelMaps Web Tools Targeted Enrichment Analysis and Management
Can I interpret sequencing data for diagnostic? http://team.babelomics.org/beta/ BIER TEAM Targeted Enrichment Analysis and Management
Introduction Sequencing data Biological knowledge TEAM ClinVar HGMD HUMSAVAR COSMIC Diagnostic TEAM Targeted Enrichment Analysis and Management
How does TEAM work? 1. VCF files 2. Gene panel TEAM ClinVar HUMSAVAR TEAM HGMD COSMIC Targeted Enrichment Analysis and Management
Getting information SIFT SIFT predicts whether an amino acid substitution affects protein function Interpretation: 1 (tolerated) to 0 (not tolerated) http://sift.jcvi.org/ PolyPhen Polymorphism Phenotyping is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein Interpretation: 1 (probably damage) to 0 (benign) http://genetics.bwh.harvard.edu/pph2/index.shtml TEAM Targeted Enrichment Analysis and Management
Getting information Consequence type or effect http://www.ensembl.org/info/genome/variation/predicted_data.html TEAM Targeted Enrichment Analysis and Management
How does TEAM work? http://team.babelomics.org/beta/ 1. Defining panel 2. Uploading input data 3. Getting results TEAM Targeted Enrichment Analysis and Management
How to define a panel? 1. Name of panel 3. Adding: 2. Diseases - more genes - mutations 4. Save panel TEAM Targeted Enrichment Analysis and Management
How to define a panel? Adding new mutations Checking mutations from Genome Viewer TEAM Targeted Enrichment Analysis and Management
Web results 1. OVERVIEW TEAM 2. DIAGNOSTIC Targeted Enrichment Analysis and Management
Web results 3. SECONDARY FINDINGS TEAM Targeted Enrichment Analysis and Management
Reporting results 4. REPORT PDF TEAM Targeted Enrichment Analysis and Management
Remarks TEAM is a free web tool Easy-to-use and powerful TEAM helps you for diagnostic TEAM Targeted Enrichment Analysis and Management
More information TEAM Tutorial: BIER TEAM http://ciberer.es/bier/team Targeted Enrichment Analysis and Management
Outline 1) Introduction to NGS Data Analysis 2) TEAM 3) PanelMaps Web Tools Introduction to NGS data analysis
Can I visualize and detect deletions for gene panel? BIER PanelMaps A web tool to analyze gene panel data
How does PanelMaps work? 1. BAM files 2. BED file PanelMaps PanelMaps Detecting altered regions for targeted sequencing
How does PanelMaps work? PanelMaps Coverage description
How does PanelMaps work? http://panelmaps.juanes.xyz/dashboard/pmapsdemo PanelMaps Detection of altered regions
How does PanelMaps work? http://panelmaps.juanes.xyz/dashboard/pmapsdemo PanelMaps Detecting altered regions for targeted sequencing
Any comment or question? Web tools for gene panel data