Ensembl and ENA. High level overview and use cases. Denise Carvalho-Silva. Ensembl Outreach Team

Similar documents
Training materials.

Browsing Genes and Genomes with Ensembl

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Browsing Genes and Genomes with Ensembl

Training materials.

Guided tour to Ensembl

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Browsing Genes and Genomes with Ensembl

Ensembl: A New View of Genome Browsing

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

In silico variant analysis: Challenges and Pitfalls

Gene-centered resources at NCBI

Browsing Genomes with Ensembl

Genomics: Genome Browsing & Annota3on

Types of Databases - By Scope

The University of California, Santa Cruz (UCSC) Genome Browser

Browsing Genomes with Ensembl Genomes

Introduc)on to Databases and Resources Biological Databases and Resources

ELE4120 Bioinformatics. Tutorial 5

European Genome phenome Archive at the European Bioinformatics Institute. Helen Parkinson Head of Molecular Archives

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

1 st transplant user training workshop Versailles, 12th-13th November 2012

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Briefly, this exercise can be summarised by the follow flowchart:

Evolutionary Genetics. LV Lecture with exercises 6KP. Databases

Bioinformatics for Proteomics. Ann Loraine

Overview of the next two hours...

Array-Ready Oligo Set for the Rat Genome Version 3.0

Aligning GENCODE and RefSeq transcripts By EMBL-EBI and NCBI

to proteomics data in the PRIDE database

Genetics and Bioinformatics

Chapter 2: Access to Information

ab initio and Evidence-Based Gene Finding

NCBI web resources I: databases and Entrez

Applied Bioinformatics

Sequence Based Function Annotation

The Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro

Supplementary Information for:

Protein Bioinformatics Part I: Access to information

Introduction to NGS analyses

Genomes with Ensembl. Dr. Giulietta Spudich CNIO, of 21

Two Mark question and Answers

ELIXIR: data for molecular biology and points of entry for marine scientists

Databases/Resources on the web

Bioinformatics for Cell Biologists

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

user s guide Question 1

B I O I N F O R M A T I C S

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

I nternet Resources for Bioinformatics Data and Tools

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Training materials.

Analysis of Microarray Data

Interpreting Genome Data for Personalised Medicine. Professor Dame Janet Thornton EMBL-EBI

Overview of Health Informatics. ITI BMI-Dept

Gene-centered databases and Genome Browsers

Gene-centered databases and Genome Browsers

This software/database/presentation is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part

G4120: Introduction to Computational Biology

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

Evidence of Purifying Selection in Humans. John Long Mentor: Angela Yen (Kellis Lab)

Analysis of Microarray Data

How to use Variant Effects Report

Computational Challenges in Life Sciences Research Infrastructures

White paper on de novo assembly in CLC Assembly Cell 4.0

What is Bioinformatics?

INTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

Investigating Inherited Diseases

Hands-On Four Investigating Inherited Diseases

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

ChroMoS Guide (version 1.2)

Niemann-Pick Type C Disease Gene Variation Database ( )

Using VarSeq to Improve Variant Analysis Research

Investigation of Genomic Variation in the Rising Era of Individual Genome Sequence: A Primer on Some Available Datasets and Structures

Tutorial section. VEGA, the genome browser with a difference

Introduction to Bioinformatics. What are the goals of the course? Who is taking this course? Different user needs, different approaches

BABELOMICS: Microarray Data Analysis

The EMBL-Bioinformatics and Data-Intensive Informatics

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

Applied Bioinformatics

GREG GIBSON SPENCER V. MUSE

The i5k a pan-arthropoda Genome Database. Chris Childers and Monica Poelchau USDA-ARS, National Agricultural Library

Shannon pipeline plug-in: For human mrna splicing mutations CLC bio Genomics Workbench plug-in CLC bio Genomics Server plug-in Features and Benefits

SNPchiMp v A database to disentangle the SNPchip jungle. Latest update: January 2014.

Introduction to EMBL-EBI.

Variant prioritization in NGS studies: Annotation and Filtering "

Deakin Research Online

Consistent annotation of gene expression arrays

Access to genes and genomes with. Ensembl. Worked Example & Exercises

Product Applications for the Sequence Analysis Collection

Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators. Paul Kersey

ArrayExpress: Quick tour

Transcription:

Ensembl and ENA High level overview and use cases Denise Carvalho-Silva Ensembl Outreach Team On behalf of Ensembl and ENA teams European Molecular Biology Laboratories Euroepan Bioinformatics Institute SME Bioinformatics Forum Barcelona 8-9 October 2012

Outline Ensembl project: background and goals Data available Data access and Ensembl tools Use cases: Ensembl and ENA Ensembl Outreach and Support Acknowledgements

Ensembl project Launched in 1999: before the release of the draft of the human genome Joint project between the EBI and WTSI www.ensembl.org launched in March 2000

Goals Provide comprehensive annotation of genomes + many more Integrate the annotation with other biological data Make them all publicly available

Ensembl: an integration point 66 vertebrate genomes Release 68 July 2012

Annotation of non-vertebrate genomes Extends the use of Ensembl to other species Wider taxonomic range (v15, 354 genomes) www.ensemblgenomes.org 6 launched in 2009

Data available in Ensembl 68 Gene annotation for 66 vetebrate species Variation data for 19 species Comparative Genomics data for 69 species Regulation data for 16 species

Data access: browser sites www.ensembl.org pre.ensembl.org archive.ensembl.org

Data access: BioMart web interface to export Ensembl data no programming skills required DATASET FILTER ATTRIBUTES RESULTS www.ensembl.org/biomart/martview

BioMart results Tables/sequences Export/email

Data access: APIs and FTP Ensembl Database (open source): Perl-API, MySQL http://www.ensembl.org/info/docs/api/index.html FTP download site http://www.ensembl.org/info/data/ftp/index.html

http://www.ensembl.org/tools.html Ensembl Tools Assembly converter ID history converter Virtual Machine Region Report Variant Effect Predictor

Gene annotation Automatic pipeline Genome-wide determination + 63 species Manual curation Gene determination on a case-by-case by an annotator + gene lists 5 species

Gene annotation on the browser Exons are drawn as boxes. Filled boxes are translated (coding) exons, empty boxes are untranslated regions (UTRs). Havana (00_) Merged ( gold ) Ensembl (20_) Havana (00_) Merged ( gold ) gene set: identical annotation from Ensembl and Havana for human, mouse, zebrafish high confidence and quality

Biological Evidence International Nucleotide Sequence databases Protein sequence databases NCBI RefSeq RNAseq (transcriptomic) data

European Nucleotide Archive ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data Data submission Data search/download http://www.ebi.ac.uk/ena/

Use case 1 - ENA Retrieve and browse the mitochondrial genome of the cave bear (Ursus spelaeus). Mo Hassan

Use case 2 - Ensembl and ENA I have submitted a DNA sequence to ENA and got the ID AF489725. Can I view this ID in Ensembl? Which gene is associated with? Which chromosome is the gene found on? What are the neighbouring genes? Is there a homologue to this gene in dog? Find the cdna alignment between the two genes Can I jump to ENA from Ensembl?

Use case 3 - Ensembl Our sequencing results identified a known SNP (rs4988235) in one of our samples in individuals from Barcelona (Spain). What is the major allele for this SNP? Is it the same in all 1000 Genomes super-populations? What is the ancestral allele? Is it conserved in vertebrates? Are there any phenotypes associated with this SNP? How many variants are associated with this phenotype? Which gene is associated to this phenotype?

Ensembl Outreach and Support Course online www.ensembl.info/ecourse Tutorials www.ensembl.org/info/website/tutorials YouTube channel www.youtube.com/user/ensemblhelpdesk Mailing lists announce@ensembl.org, dev@ensembl.org Comments and questions? helpdesk@ensembl.org

Acknowledgements Funded by the Wellcome Trust, NIH-NHGRI, EU and EMBL

Ensembl Team Retreat 2012 Norwich, United Kingdom

Acknowledgements Clara Amid, Ewan Birney, Lawrence Bower, Ana Cerdeño- Tárraga, Ying Cheng, Iain Cleland, Nadeem Faruque, Richard Gibson, Neil Goodgame, Christopher Hunter, Mikyung Jang, Rasko Leinonen, Xin Liu, Arnaud Oisel, Nima Pakseresht, Sheila Plaister, Rajesh Radhakrishnan, Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane Funded by EMBL, EU, Wellcome Trust, BBSRC datasubs@ebi.ac.uk update@ebi.ac.uk