Browsing Genes and Genomes with Ensembl

Size: px
Start display at page:

Download "Browsing Genes and Genomes with Ensembl"

Transcription

1 Browsing Genes and Genomes with Ensembl Emily Perry Ensembl Outreach Project Leader EMBL-EBI

2 Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website. Where to go for help and documentation.

3 This webinar course Date Webinar topic Instructor 6th April Introduction to Ensembl Helen Sparrow 13th April Ensembl genes Emily Perry 20th April Data export with BioMart Victoria Newman 27th April Variation data in Ensembl and the Ensembl VEP Victoria Newman 4th May Comparing genes and genomes with Ensembl Compara Ben Moore 11th May Finding features that regulate genes the Ensembl Regulatory Build Ben Moore 18th May Uploading your data to Ensembl and advanced ways to access Ensembl data Emily Perry

4 Structure Presentation: Where Ensembl genes come from Demo: Getting gene data Exercises: On the train online course

5 Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface My Ensembl colleagues will respond during the talk There s no threading so please respond Ben Moore Victoria Newman

6 Course exercises ser-webinar-series-2016 This text will be replaced by a YouTube (link to YouKu too) video of the webinar and a pdf of the slides. A link to exercises and their solutions will appear in the page hierarchy The next page will be the exercises

7 Get help with the exercises Use the exercise solutions in the online course Join our Facebook group and discuss the exercises with everybody (see the online course for the link) us

8 Genes and Transcripts EBI is an Outstation of the European Molecular Biology Laboratory.

9 Gene views 0## - Havana annotation Coding exon Intron Non-coding exon Merged transcript Protein coding transcript 2## - Ensembl annotation Non-coding transcript

10 Golden transcripts Identical annotation gf Higher confidence and quality

11 Ensembl and Havana annotation Automatic annotation Manual annotation

12 Automatic gene annotation Genome-wide determination using the Ensembl automated pipeline Predictions based on experimental (biological) data Known proteins/cdnas plotted onto the genome using sequence matching

13 Biological Evidence International Nucleotide Sequence databases Protein sequence databases Swiss-Prot: manually curated TrEMBL: unreviewed translations NCBI RefSeq Manually annotated proteins and mrnas (NP, NM)

14 Other species Infer genes from homology to other species Eg predict genes in from to the RNAseq data by mapping cdnas/proteins genome

15 Manual gene annotation Gene determination on a case-by-case basis by a person h Genome-wide Genes list

16 GENCODE The GENCODE gene set is made up of: Ensembl automatically annotated genes Havana manually annotated genes The merged gene set GENCODE is the default gene set used by ENCODE, 1000 genomes and other major projects.

17 Golden transcripts Identical annotation gf Higher confidence and quality

18 CCDS transcripts Consensus coding DNA sequence set Agreement between EBI, WTSI, UCSC and NCBI vg

19 Higher quality transcripts

20 Which transcript to use? GENCODE Basic: Only the complete transcripts (where a gene has complete transcripts) ( Transcript support level: Scored 1-5 for quality, where 1 is the best ( APPRIS principal isoform: The major isoform(s) from combining protein structural information, functionally important residues and evidence from cross-species alignments. ( + CCDS, + Golden transcripts

21 Ensembl stable IDs ENSG########### ENST########### ENSP########### ENSE########### Ensembl Gene ID Ensembl Transcript ID Ensembl Peptide ID Ensembl Exon ID For non-human species a suffix is added: MUS (Mus musculus) for mouse ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###

22 Why Gene Ontology (GO)? Multiple terms for the same thing Gene descriptions too specific Non-specific immunity Innate immunity Natural killer cells Cytokines Complement Phagocyte Mast cells

23 GO terms form a controlled vocabulary GO: innate immune response Innate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.

24 GO terms are hierarchical GO: immune response GO: GO: innate immune response complement activation, alternative pathway GO: hemolymph coagulation GO: induced systemic resistance GO: MAPK cascade involved in innate immune response GO: defence response, incompatible interaction GO: GO: GO: GO: response to type II interferon complement activation, lectin pathway innate immune response in mucosa virus induced gene silencing GO: GO: GO: GO: response to type I interferon melanisation defence response natural killer cell mediated immunity response to interferon-gamma GO: GO: GO: GO: positive reg of innate immune response plant-type hypersensitive response negative reg of innate immune response regulation of innate immune response

25 Hands on We re going to look at an Ensembl gene, ESPN, and find out information about it and its transcripts.

26 Next webinar Data export with BioMart Ensembl data can be easily exported in bulk using BioMart. BioMart is a flexible tool that allows you to easily specify what Ensembl features you want data for, and what data you want to see about them, then export those data in a table or as sequences. Learn the basics of running a BioMart query, and explore some of the options that are available. Victoria Newman

27 Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface My Ensembl colleagues will respond during the talk There s no threading so please respond Ben Moore Victoria Newman

28 Course exercises ser-webinar-series-2016 This text will be replaced by a YouTube (link to YouKu too) video of the webinar and a pdf of the slides. A link to exercises and their solutions will appear in the page hierarchy The next page will be the exercises

29 Get help with the exercises Use the exercise solutions in the online course Join our Facebook group and discuss the exercises with everybody (see the online course for the link) us

30 Help and documentation Course online Tutorials Flash animations us Ensembl public mailing lists

31 Follow us

32 Publications Aken, B. et al Ensembl 2017 Nucleic Acids Research Xosé M. Fernández-Suárez and Michael K. Schuster Using the Ensembl Genome Server to Browse Genomic Sequence Data. Current Protocols in Bioinformatics (2010) Giulietta M Spudich and Xosé M Fernández-Suárez Touring Ensembl: A practical guide to genome browsing BMC Genomics 11:295 (2010)

33 Ensembl Acknowledgements The Entire Ensembl Team Funding Co-funded by the European Union