Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html
Exploring Genes and Variants with Ensembl Helen Sparrow Ensembl Outreach EMBL-EBI
Today 11.00-11.45 Introduction to Genome Browsers Genes and Transcripts Variation 14.00-15.30 or 16.00-17.30 Browsing Genes and Transcripts Variants Assembly Converter
Tomorrow: 11-11.45 Variant Effect Predictor The VEP determines the effects of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website How to use Ensembl tools Where to go for help and documentation
Course materials tinyurl.com/vepcrete Presentation Coursebook (screenshots of demo) Exercises and Answers
Introduction Why do we need genome browsers? 1977: 1st genome to be sequenced (5 kb) 2004: Finished human sequence (3 Gb) 2015: Completion of 1000 Genomes project
Why do we need Genome Browsers? CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCACTTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAAT TGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTG GCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTT TTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCT TGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACT AACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGC CTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAA TCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAAC ATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAAT ATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCA TTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTG AGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTT GTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAA CTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGA CATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAG CACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTAC CAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTA CGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCAT TAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACT TTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGA GGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATA ACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCA GTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGCATTAAA CTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAA ATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAG AGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCAT GCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGC TGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAG CTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGC ATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCA CAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATG TTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAG TTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGC ATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGT GAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTT TGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATA ACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAG Buried in the genome sequence is information about: Function- such as transcripts and proteins Expression- regulatory elements Variation- SNPs, structural variants
Why do we need Genome Browsers? CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCACTTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAAT TGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTG GCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTT TTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCT TGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACT AACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGC CTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAA TCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAAC ATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAAT ATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCA TTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTG AGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTT GTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAA CTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGA CATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAG CACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTAC CAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTA CGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCAT TAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACT TTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGA GGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATA ACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCA GTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGCATTAAA CTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAA ATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAG AGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCAT GCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGC TGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAG CTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGC ATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCA CAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATG TTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAG TTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGC ATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGT GAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTT TGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATA ACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAG http://www.ncbi.nlm. nih.gov/mapview http://ensembl.org http://genome.ucsc.edu
Ensembl - more than a browser Import Genome Assemblies Annotate genomes - Genes and transcripts - Variants - Regulatory features - Comparative Genomics Display and export Tools
Ensembl Features Gene builds for ~70 species Gene trees Regulatory build (ENCODE) Variation display and VEP Display of user data BioMart (data export) Programmatic access via the APIs Completely Open Source
Reference Genome Assemblies GRCh38 www.ensembl.org (UCSC equivalent is hg38) Current, supported GRCh37 250 gaps http://grch37.ensembl.org/ (UCSC hg19) Limited data updates NCBI36 150,000 gaps http://may2009.archive.ensembl.org/ (UCSC hg18) No updates
Reference Genome contigs BL102 AL476 BL AL CM553 CM IM768 IM
Genes and Transcripts EBI is an Outstation of the European Molecular Biology Laboratory.
Ensembl and Havana annotation Automatic annotation Manual annotation
Automatic gene annotation Genome-wide determination using the Ensembl automated pipeline Predictions based on experimental (biological) data Predictions based on the genomic sequence (ab initio)
Transcript annotation
Other species Infer genes from homology to other species Eg predict genes in from to the RNAseq data by mapping cdnas/proteins genome
Manual gene annotation Gene determination on a case-by-case basis by a person Genome-wide Genes list vega.sanger.ac.uk
GENCODE The GENCODE gene set is made up of: Ensembl automatically annotated genes Havana manually annotated genes The merged gene set default gene set for ENCODE 1000 genomes and lots of other major projects
Merged Golden transcripts Identical annotation Higher confidence and quality CCDS transcripts Consensus coding DNA sequence set Agreement between EBI, WTSI, UCSC and NCBI
Transcript views
Ensembl stable IDs ENSG########### ENST########### ENSP########### ENSE########### Ensembl Gene ID Ensembl Transcript ID Ensembl Peptide ID Ensembl Exon ID For non-human species a suffix is added: MUS (Mus musculus) for mouse ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG### http://www.ensembl.org/info/genome/stable_ids/index.html
Why Gene Ontology (GO)? Multiple terms for the same thing Gene descriptions too specific Non-specific immunity Innate immunity Natural killer cells Cytokines Complement Phagocyte Mast cells
GO terms form a controlled vocabulary GO:0045087 - innate immune response Innate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.
GO terms are hierarchical GO:0006955 immune response GO:0045087 GO:0006957 innate immune response complement activation, alternative pathway GO:0042381 hemolymph coagulation GO:0009682 induced systemic resistance GO:0035420 MAPK cascade involved in innate immune response GO:0009814 defence response, incompatible interaction GO:0034342 GO:0001867 GO:0002227 GO:0009616 response to type II interferon complement activation, lectin pathway innate immune response in mucosa virus induced gene silencing GO:0034340 GO:0035006 GO:0002228 GO:0034341 response to type I interferon melanisation defence response natural killer cell mediated immunity response to interferon-gamma GO:0045089 GO:0009626 GO:0045824 GO:0045088 positive reg of innate immune response plant-type hypersensitive response negative reg of innate immune response regulation of innate immune response
Questions?
Variation EBI is an Outstation of the European Molecular Biology Laboratory.
Variant sources http://www.ensembl.org/info/genome/variation/sources_documentation.html#homo_sapiens
Variant types 1) Small scale in one or few nucleotides of a gene Small deletions and insertions (DIPs or indels) Single nucleotide polymorphism (SNP) A G A C T T G A C C T G T C T - A A C T G G A T G A C T T G A C - T G T C T G A A C G G G A 2) Large scale (>50bp) in chromosomal structure (structural variant) Copy number variants (CNV) Large deletions/duplications, insertions, translocations deletion duplication insertion translocation
Variant consequences CODING Synonymous Regulatory CODING Missense AAAAAAA ATG 5 Upstream 5 UTR Splice site Intronic 3 UTR 3 Downstream
SO consequence terms http://www.ensembl.org/info/docs/variation/predicted_data.html
Reference alleles BL102 AL476 BL BL102 AGTCGTAGCTAGC TAGGCCATAGGCGA AL CM553 CM IM768 IM Frequency T = 0.05, frequency G = 0.95 G is the allele in all primates T causes disease susceptibility T is allele in the contig used so T is the reference allele and G is the alternate allele and alleles are T/G
Allele strand AGTCGTAGCTAGC T/GAGGCCATAGGCGA TCGCCTATGGCCT A/CGCTAGCTACGACT Exon sequence: TATGGCCTA/CGCTAGC Alleles in database = T/G Alleles in gene = A/C Alleles = A/C -ve strand or T/G +ve strand Alleles = A/C or T/G Often lack further info
Questions?
Help and documentation Course online http://www.ebi.ac.uk/training/online/subjects/11 Tutorials www.ensembl.org/info/website/tutorials Videos www.youtube.com/user/ensemblhelpdesk Email us helpdesk@ensembl.org
Host a FREE Workshop! Invite one of our outreach team to teach at your institution for free (except trainer s expenses) E-mail us: helpdesk@ensembl.org Browser Course ½-2 day course on the Ensembl browser, aimed at wet-lab scientists. 1-2 trainers. API course 2-4 day course on the Ensembl APIs (Perl or REST) aimed at bioinformaticians. 1-4 trainers.
Acknowledgements The Entire Ensembl Team Funding Co-funded by the European Union
Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publication s.html