Training materials.

Similar documents
Training materials.

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Training materials.

Browsing Genes and Genomes with Ensembl

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Applied Bioinformatics

Guided tour to Ensembl

Browsing Genes and Genomes with Ensembl

In silico variant analysis: Challenges and Pitfalls

Overview of the next two hours...

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Ensembl and ENA. High level overview and use cases. Denise Carvalho-Silva. Ensembl Outreach Team

Exploring genomic databases: Practical session "

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

Linking Genetic Variation to Important Phenotypes

Briefly, this exercise can be summarised by the follow flowchart:

Chapter 2: Access to Information

INTRODUCTION TO MOLECULAR GENETICS. Andrew McQuillin Molecular Psychiatry Laboratory UCL Division of Psychiatry 22 Sept 2017

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

The University of California, Santa Cruz (UCSC) Genome Browser

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Gene-centered databases and Genome Browsers

Gene-centered databases and Genome Browsers

Ensembl: A New View of Genome Browsing

Niemann-Pick Type C Disease Gene Variation Database ( )

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Automating the ACMG Guidelines with VSClinical. Gabe Rudy VP of Product & Engineering

INTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet

How to use Variant Effects Report

Hands-On Four Investigating Inherited Diseases

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Browsing Genes and Genomes with Ensembl

Variant prioritization in NGS studies: Annotation and Filtering "

GREG GIBSON SPENCER V. MUSE

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Genomics: Human variation

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Genomics: Genome Browsing & Annota3on

Using VarSeq to Improve Variant Analysis Research

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome

Lecture 2: Biology Basics Continued

Axiom mydesign Custom Array design guide for human genotyping applications

Structure, Measurement & Analysis of Genetic Variation

CITATION FILE CONTENT / FORMAT

Interpreting Genome Data for Personalised Medicine. Professor Dame Janet Thornton EMBL-EBI

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Investigating Inherited Diseases

Genetics and Bioinformatics

Applied Bioinformatics

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

KnetMiner USER TUTORIAL

Introduc)on to Databases and Resources Biological Databases and Resources

SNP calling and VCF format

SNPs - GWAS - eqtls. Sebastian Schmeier

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Infinium Global Screening Array-24 v1.0 A powerful, high-quality, economical array for population-scale genetic studies.

Towards Personal Genomics

Gene-centered resources at NCBI

Prioritization: from vcf to finding the causative gene

Patrocles: a database of polymorphic mir-mediated gene regulation in vertebrates

European Genome phenome Archive at the European Bioinformatics Institute. Helen Parkinson Head of Molecular Archives

Bioinformatics for Proteomics. Ann Loraine

USDA NRSP-8 BIOINFORMATICS COORDINATION PROGRAM ACTIVITIES

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

From genome-wide association studies to disease relationships. Liqing Zhang Department of Computer Science Virginia Tech

user s guide Question 1

SUPPLEMENTARY INFORMATION

About Strand NGS. Strand Genomics, Inc All rights reserved.

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

Introduction to NGS analyses

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

Lecture 2: Biology Basics Continued. Fall 2018 August 23, 2018

Introduction to RNA-Seq in GeneSpring NGS Software

Course: IT and Health

user s guide Question 3

Variant annotation. Variant Effect Predictor (VEP) Edinburgh Genomics. Marta Bleda Latorre. 23 October

ANNOVAR Variant Annotation and Interpretation

Introduction to the UCSC genome browser

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

The Human Genome and its upcoming Dynamics

1 st transplant user training workshop Versailles, 12th-13th November 2012

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Introduction to human genomics and genome informatics

Variant annotation. ANNOVAR and CellBase. University of Cambridge. Javier López. Acknowledgements: Marta Bleda Latorre.

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for

Genomic Resources and Gene/QTL Discovery in Livestock

Variant calling in NGS experiments

Figure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

Databases/Resources on the web

RNA-Seq Analysis. August Strand Genomics, Inc All rights reserved.

Deleterious mutations

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

Browsing Genomes with Ensembl Genomes

earray 5.0 Create your own Custom Microarray Design

Navigating the GTEx eqtl database and using TaqMan Assays to verify eqtl links in disease research

Browsing Genomes with Ensembl

Transcription:

Training materials - Ensembl training materials are protected by a CC BY license - http://creativecommons.org/licenses/by/4.0/ - If you wish to re-use these materials, please credit Ensembl for their creation - If you use Ensembl for your work, please cite our papers - http://www.ensembl.org/info/about/publications.html

Viewing the data: the Ensembl browser and its possibilities Benjamin Moore Ensembl Outreach Officer Slides available from training.ensembl.org/events/ EBI is an Outstation of the European Molecular Biology Laboratory.

Why do we need genome browsers? 1977: 1st genome to be sequenced (5 kb) 2004: finished human sequence (3 Gb)

Why do we need genome browsers? CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCAC TTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTC CGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATT GAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACA GCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGA GTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGA ATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAA ATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCAT AAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATA AAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTT TTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACT AAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGA GACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTAC ACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACA TAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCA TATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCT AAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGA ATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTC CCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTA TTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCA TTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAAT TATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATG AGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAA GCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACAC TCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATC TTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTATGACTGTTAGCTA

Ensembl- unlocking the code - Genomic assemblies - automated gene annotation - Variation - Small and large scale sequence variation with phenotype associations - Comparative Genomics - Whole genome alignments, gene trees - Regulation - Potential promoters and enhancers, DNA methylation

Ensembl Features - Gene builds for ~100 species - Gene trees - Regulatory build - Variation display and VEP - Display of user data - BioMart (data export) - Programmatic access via the APIs - Completely Open Source

Ensembl- access to 100+ genomes

Ensembl Genomes- expanding Ensembl www.ensembl.org - Vertebrates - Other representative species

Ensembl Genomes- expanding Ensembl www.ensembl.org - - Vertebrates Other representative species www.ensemblgenomes.org - Bacteria - Fungi - Protists - Metazoa - Plants

Variation types 1) Small scale in one or few nucleotides of a gene Small insertions and deletions (DIPs or indels) Single nucleotide polymorphism (SNP) A G A C T T G A C C T G T C T - A A C T G G A T G A C T T G A C - T G T C T G A A C G G G A 2) Large scale in chromosomal structure (structural variation) Copy number variations (CNV) Large deletions/duplications, insertions, translocations deletion duplication insertion translocation

22 species with variation data

Variation sources http://www.ensembl.org/info/docs/variation/sources_documentation.html

The Ensembl variation process Import reference variations Apply quality control Import richer data Apply analysis VEP Web Site Biomart Perl API REST API

Quality Control We summarise the information supporting each dbsnp variant to give a guide to its reliability We run basic checks and flag suspicious variants

Finding your variant You can find variant information by: - searching for a gene or phenotype and examining the variant or associated features table - viewing a genomic location and clicking on a variant in one of the variant tracks - searching directly by variant name

The Variant Tab Summary data present on all pages Menu of pages, available on all pages Icons representing available pages (grey when no data available)

Allele Frequencies- the 1000 Genomes Project The 1000 Genomes Phase 3 data provides genotype data for 2504 individuals from 26 different narrowly defined populations in 5 groupings.

Allele Frequencies- GnomAD The Genome Aggregation Database provides allele frequency data for over 130,000 samples from 7 different populations Sample numbers From: https://macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/

Allele Frequencies Concise summary

Allele Frequency Distributions Pie charts and tables display frequency data for different studies. These include TOPMed and UK10K (ALSPAC and TWINS UK) for human, WTSI Mouse Genomes Project and NextGen

Phenotype and Disease Data NHGRI-EBI GWAS Catalog Variant Gene Structural Variant Attributes types held for phenotype features include: - clinical significance reported by ClinVar - inheritance type - reported genes /variants - risk allele p-value odds ratio beta coefficient QTL

Terminology A disease can be considered as a bundle of phenotypes observed in the subjects, often named after the person who studied and characterised it. Disease Phenotypes Cognitive impairment HP:0100543 Baldwin Syndrome Aggressive behavior HP:0000718 Small hands HP:0200055 Hand size can also be considered a trait

The Naming Problem Different sources describe the phenotypes/ diseases in different ways - We need to normalise the descriptions for easy querying - Also need to respect the original data Features are often reported as associated with a disease sub-type making it complex to extract all loci of interest

The Solution Use phenotype and disease ontologies to organise descriptions and allow query and display by synonym and child/parent term.

Viewing Ontology Mappings by Variant Additional columns in the table of diseases linked to a variant show the Ontology terms we have mapped to the description It is now clear these descriptions refer to the same disease

Phenotype and Disease Data Very concise summary

Phenotype and Disease Data

Citation Data - Sources dbsnp - Publication information is submitted with the variant submission. EuropePMC - Publications for which the full text is freely available in PubMed Central are mined for refsnp identifiers. UCSC Genocoding Project - Publications for which the text is freely available or those from publishers who have agreed to data mining are mined for refsnp identifiers. Species Citations sheep 31 horse 58 pig 73 rat 90 dog 200 chicken 414 cow 1,625 mouse 8,309 human 594,364

Citation Data Very concise summary

Citation Data Link to full text

Variant Annotation We analyse the imported variation data utilising other Ensembl resources. Annotation provided includes: - Comparison to Ensembl gene structures to predict transcript/ protein conseqeunces - Comparison to Ensembl regulatory features to find co-location - Comparison to other species to predict ancestral allele, show phylogenic context - Linkage disequilibrium calculations for the 1000 Genomes populations

Variant consequences CODING Synonymous Regulatory CODING Non-synonymous AAAAAAA ATG 5 Upstream 5 UTR Splice site Intronic 3 UTR We calculated the predicted consequences of variants on the Ensembl transcript set and assign Sequence Ontology terms 3 Downstream

Genes and Regulation Concise summary

Genes and Regulation Tables show the predicted impact of a variant on all of the transcripts it overlaps The SIFT and PolyPhen2 packages predict how likely a variant is to be damaging to a protein. Red indicates damaging; green indicates tolerated; blue indicates a low quality prediction

Summary The Ensembl genome browser displays: - imported variation data, including phenotype associations, allele frequencies and literature citations from a variety of different sources - additional variant annotation, including functional consequence predictions - this data can be accessed through the web-interface, BioMart, Perl API, REST API and the FTP site

Acknowledgements The Entire Ensembl Team Funding Co-funded by the European Union

Training materials - Ensembl training materials are protected by a CC BY license - http://creativecommons.org/licenses/by/4.0/ - If you wish to re-use these materials, please credit Ensembl for their creation - If you use Ensembl for your work, please cite our papers - http://www.ensembl.org/info/about/publications.html