Browsing Genes and Genomes with Ensembl
|
|
- Cuthbert Walters
- 5 years ago
- Views:
Transcription
1 Browsing Genes and Genomes with Ensembl Emily Perry Ensembl Outreach Project Leader EMBL-EBI
2 Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website. Where to go for help and documentation.
3 This webinar course Date Webinar topic Instructor 6th April Introduction to Ensembl Helen Sparrow 13th April Ensembl genes Emily Perry 20th April Data export with BioMart Victoria Newman 27th April Variation data in Ensembl and the Ensembl VEP Victoria Newman 4th May Comparing genes and genomes with Ensembl Compara Ben Moore 11th May Finding features that regulate genes the Ensembl Regulatory Build Ben Moore 18th May Uploading your data to Ensembl and advanced ways to access Ensembl data Emily Perry
4 Structure Presentation: Where Ensembl genes come from Demo: Getting gene data Exercises: On the train online course
5 Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface My Ensembl colleagues will respond during the talk There s no threading so please respond Ben Moore Victoria Newman
6 Course exercises ser-webinar-series-2016 This text will be replaced by a YouTube (link to YouKu too) video of the webinar and a pdf of the slides. A link to exercises and their solutions will appear in the page hierarchy The next page will be the exercises
7 Get help with the exercises Use the exercise solutions in the online course Join our Facebook group and discuss the exercises with everybody (see the online course for the link) us
8 Genes and Transcripts EBI is an Outstation of the European Molecular Biology Laboratory.
9 Gene views 0## - Havana annotation Coding exon Intron Non-coding exon Merged transcript Protein coding transcript 2## - Ensembl annotation Non-coding transcript
10 Golden transcripts Identical annotation gf Higher confidence and quality
11 Ensembl and Havana annotation Automatic annotation Manual annotation
12 Automatic gene annotation Genome-wide determination using the Ensembl automated pipeline Predictions based on experimental (biological) data Known proteins/cdnas plotted onto the genome using sequence matching
13 Biological Evidence International Nucleotide Sequence databases Protein sequence databases Swiss-Prot: manually curated TrEMBL: unreviewed translations NCBI RefSeq Manually annotated proteins and mrnas (NP, NM)
14 Other species Infer genes from homology to other species Eg predict genes in from to the RNAseq data by mapping cdnas/proteins genome
15 Manual gene annotation Gene determination on a case-by-case basis by a person h Genome-wide Genes list
16 GENCODE The GENCODE gene set is made up of: Ensembl automatically annotated genes Havana manually annotated genes The merged gene set GENCODE is the default gene set used by ENCODE, 1000 genomes and other major projects.
17 Golden transcripts Identical annotation gf Higher confidence and quality
18 CCDS transcripts Consensus coding DNA sequence set Agreement between EBI, WTSI, UCSC and NCBI vg
19 Higher quality transcripts
20 Which transcript to use? GENCODE Basic: Only the complete transcripts (where a gene has complete transcripts) ( Transcript support level: Scored 1-5 for quality, where 1 is the best ( APPRIS principal isoform: The major isoform(s) from combining protein structural information, functionally important residues and evidence from cross-species alignments. ( + CCDS, + Golden transcripts
21 Ensembl stable IDs ENSG########### ENST########### ENSP########### ENSE########### Ensembl Gene ID Ensembl Transcript ID Ensembl Peptide ID Ensembl Exon ID For non-human species a suffix is added: MUS (Mus musculus) for mouse ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###
22 Why Gene Ontology (GO)? Multiple terms for the same thing Gene descriptions too specific Non-specific immunity Innate immunity Natural killer cells Cytokines Complement Phagocyte Mast cells
23 GO terms form a controlled vocabulary GO: innate immune response Innate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.
24 GO terms are hierarchical GO: immune response GO: GO: innate immune response complement activation, alternative pathway GO: hemolymph coagulation GO: induced systemic resistance GO: MAPK cascade involved in innate immune response GO: defence response, incompatible interaction GO: GO: GO: GO: response to type II interferon complement activation, lectin pathway innate immune response in mucosa virus induced gene silencing GO: GO: GO: GO: response to type I interferon melanisation defence response natural killer cell mediated immunity response to interferon-gamma GO: GO: GO: GO: positive reg of innate immune response plant-type hypersensitive response negative reg of innate immune response regulation of innate immune response
25 Hands on We re going to look at an Ensembl gene, ESPN, and find out information about it and its transcripts.
26 Next webinar Data export with BioMart Ensembl data can be easily exported in bulk using BioMart. BioMart is a flexible tool that allows you to easily specify what Ensembl features you want data for, and what data you want to see about them, then export those data in a table or as sequences. Learn the basics of running a BioMart query, and explore some of the options that are available. Victoria Newman
27 Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface My Ensembl colleagues will respond during the talk There s no threading so please respond Ben Moore Victoria Newman
28 Course exercises ser-webinar-series-2016 This text will be replaced by a YouTube (link to YouKu too) video of the webinar and a pdf of the slides. A link to exercises and their solutions will appear in the page hierarchy The next page will be the exercises
29 Get help with the exercises Use the exercise solutions in the online course Join our Facebook group and discuss the exercises with everybody (see the online course for the link) us
30 Help and documentation Course online Tutorials Flash animations us Ensembl public mailing lists
31 Follow us
32 Publications Aken, B. et al Ensembl 2017 Nucleic Acids Research Xosé M. Fernández-Suárez and Michael K. Schuster Using the Ensembl Genome Server to Browse Genomic Sequence Data. Current Protocols in Bioinformatics (2010) Giulietta M Spudich and Xosé M Fernández-Suárez Touring Ensembl: A practical guide to genome browsing BMC Genomics 11:295 (2010)
33 Ensembl Acknowledgements The Entire Ensembl Team Funding Co-funded by the European Union