Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Size: px
Start display at page:

Download "Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016"

Transcription

1 Training materials Ensembl training materials are protected by a CC BY license If you wish to re-use these materials, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers

2 Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016 EBI is an Outstation of the European Molecular Biology Laboratory.

3 Course materials VEP Presentation VEP Coursebook (screenshots of demo) VEP Exercises

4 Ensembl Features - Gene builds for ~70 species - Gene trees - Regulatory build - Variation display and VEP - Display of user data - BioMart (data export) - Programmatic access via the APIs - Completely Open Source

5 What is the VEP? Determine the effect of variants (SNPs, insertions, deletions, CNVs or structural variants): - Variant Coordinates - VCF - HGVS - Variant IDs - Affected gene, transcript and protein sequence - Pathogenicity - Frequency data - Regulatory consequences - Splicing consequences - Literature citations

6 What is the VEP? Web interface Perl script REST API XML ensembl.org/tools/vep rest.ensembl.org

7 Features of VEP Web, Perl script and REST API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP

8 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP Web: 50mb (~2 million variants) Script: unlimited variants

9 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP VCF rsid HGVS BED Pileup

10 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets GENCODE GENCODE Basic RefSeq Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP GENCODE & RefSeq

11 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP The Ensembl Regulatory Build: ENCODE BLUEPRINT NIH Epigenomics Roadmap Can be limited to regulatory regions observed in specific cell types.

12 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP dbsnp Cosmic Clinvar ESP HGMD-Public Phencode

13 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP E.g. Splicing predictions Loss of Function predictions Expression levels across transcripts Anything - customisable!

14 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP 1000 Genomes ESP ExAC projects GnomAD - coming soon!

15 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP OMIM Orphanet GWAS Catalog others

16 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP Assigned by ClinVar

17 Features of VEP Web, Perl script and API Over 5,000 species Input Transcript sets Regulatory regions Known Variants Plugins Allele frequencies Associated phenotype, disease or trait Clinical significance states Important projects which use VEP 1000 Genomes ExAC DECIPHER OpenTargets LRG GnomAD

18 Your own variant data Variant coordinates HGVS notation ENST :c.1047_1048insC 5:g T>C NM_ :c.7C>T ENSP :p.Ala2233Asp NP_ :p.Ile2285Val VCF #CHROM Variant IDs rs COSM rs FANCD1:c.475G>A rs POS ID rs rs /C T/C T/A G/C C/T REF ALT G A T A A G,T T.

19 Variant types 1) Small scale in one or few nucleotides of a gene Small insertions and deletions (DIPs or indels) Single nucleotide polymorphism (SNP) A G A C T T G A C C T G T C T - A A C T G G A T G A C T T G A C - T G T C T G A A C G G G A 2) Large scale in chromosomal structure (structural variant) Copy number variants (CNV) Large deletions/duplications, insertions, translocations deletion duplication insertion translocation

20 Variant consequences CODING Synonymous Regulatory CODING Missense AAAAAAA ATG 5 Upstream 5 UTR Splice site Intronic 3 UTR 3 Downstream Identify transcripts that overlap the coordinates of the variants - Gencode or RefSeq or BOTH Predict the consequences of the variants

21 Consequence terms

22 Missense variants- pathogenicity SIFT PolyPhen Probably damaging Possibly damaging Tolerated Benign Deleterious 0

23 VEP plugins Plugins add extra functionality to the VEP They may extend, filter or manipulate the output of the VEP Plugins may make use of external data or code Available on the web tool and with the script

24 Pathogenicity Prediction Plugins dbnsfp - annotation database for missense SNPs Condel - consensus deleteriousness from SIFT and PolyPhen LoFtool - ranks susceptibility to disease based on Loss of Function to synonymous variants in ExAC data

25 Hands on We have identified four variants on human chromosome nine, an A deletion at , C->A at , C->G at and G->A at We will use the Ensembl VEP to determine: - Whether my variants have already been annotated in Ensembl - What genes are affected by my variants? - Do any of my variants affect gene regulation?

26 Questions?

27 Help and documentation Course online Tutorials Videos us

28 Host a FREE Workshop! Invite one of our outreach team to teach at your institution for free (except trainer s expenses) us: helpdesk@ensembl.org Browser Course ½ - 2 day course on the Ensembl browser, aimed at wet-lab scientists. 1-2 trainers. API course 2-4 day course on the Ensembl APIs (Perl or REST) aimed at bioinformaticians. 1-4 trainers.

29 Acknowledgements The Entire Ensembl Team Funding Co-funded by the European Union

30 Training materials Ensembl training materials are protected by a CC BY license If you wish to re-use these materials, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers s.html