Genomes with Ensembl. Dr. Giulietta Spudich CNIO, of 21

Size: px
Start display at page:

Download "Genomes with Ensembl. Dr. Giulietta Spudich CNIO, of 21"

Transcription

1 Genomes with Ensembl Dr. Giulietta Spudich CNIO, of 21

2 Overview of the day Introduction and website walk-through Hands on exercises (the browser) Introduction to BioMart (data mining) Variations Comparative Genomics Upcoming Ensembl 2 of 21

3 Genome browsers provide a map DNase I sensitive site Histone modification Gene SNP Conserved sequence 3 of 21

4 Genome Browsers Ensembl Genome browser NCBI Map Viewer UCSC Genome Browser 4 of 21

5 Ensembl Genome Browser 5 of 21

6 NCBI Map Viewer 6 of 21

7 UCSC Genome Browser 7 of 21

8 What Distinguishes Ensembl from the UCSC and NCBI Browsers? The gene set. Automatic annotation based on mrna and protein information. Programmatic access via the Perl API (open source) BioMart Integration with other databases (DAS) Comparative analysis (gene trees) 8 of 21

9 To meet a challenge Ensembl s AIM: To provide annotation for the biological community that is freely available and of high quality Started in 2000 Joint project between EBI and Sanger Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC 9 of 21

10 Species in Ensembl MYBP CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA MAMMALS BIRDS REPTILES FISHES PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS PALEOGNATHS PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES 10 of 21

11 What is an automated pipeline? Automatic annotation pipeline: Gene building all at once (whole genome) Ensembl Manual curation: case-by-case basis VEGA: Vertebrate Genome Annotation Havana 11 of 21

12 Ensembl Genes biological basis All Ensembl gene predictions are based on proteins and mrnas in: UniProt/Swiss-Prot (manually curated) UniProt/TrEMBL NCBI RefSeq (manually curated) Protein/ mrna Sequence Assembly Ensembl Genes 12 of 21

13 Genes and Transcripts in Ensembl Ensembl known genes or transcripts Ensembl novel genes or transcripts Ensembl EST genes or transcripts Non-Ensembl genes: Non-Ensembl genes: Imports for yeast, c.elegans, fly, mosquito, takifugu and tetraodon, 13 of 21

14 What annotation is available? Gene/transcript/peptide models (coding and noncoding (ncrnas)) IDs in other databases Mapped cdnas, peptides, micro array probes, BAC clones etc. Other features of the genome: cytogenetic bands, markers, repeats etc. Comparative data: orthologues and paralogues, protein families, whole genome alignments, syntenic regions Variation data: SNPs Regulatory data: best guess set of regulatory elements from ENCODE Data from external sources (DAS) 14 of 21

15 How is this information organised? Ensembl Views (Website) Ensembl Database (open source) BioMart DataMining tool 15 of 21

16 Names in Ensembl ENSG### ENST### ENSP### ENSE### Ensembl Gene ID Ensembl Transcript ID Ensembl Peptide ID Ensembl Exon ID For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc. 16 of 21

17 Gene Structure in Ensembl No UTRs Calmodulin Chicken UTRs annotated Calmodulin Human 17 of 21

18 Help and Information Comments and questions? View our help videos Mailing lists: Come visit our blog! ONLINE Course: 18 of 21

19 Old View 19 of 21

20 New View! 20 of 21

21 Ensembl Team September 2008 Ensembl Vertebrate Genomics Software Comparative Genomics Functional Genomics Variation Analysis and Annotation Web Team Zebrafish VectorBase Outreach Systems & Support Research Ensembl Strategy Paul Flicek (EBI), Steve Searle (Sanger Institute) Mario Caccamo, Laura Clark, Jonathan Hinton, Zam Iqbal, Vasudev Kumanduri, Ilkka Lappalainen Glenn Proctor, Syed Haider, Andrew Jenkinson, Andreas Kähäri, Stephen Keenan, Rhoda Kinsella, Eugene Kulesha, Ian Longden, Daniel Rios Javier Herrero, Kathryn Beal, Benoît Ballester, Stephen Fitzgerald, Leo Gordon, Albert Vilella Nathan Johnson, Stefan Gräf, Steven Wilder Fiona Cunningham, Yuan Chen Bronwen Aken, Julio Banet, Susan Fairley, Jan-Hinnerck Vogel, Simon White, Amonida Zadissa James Smith, Eugene Bragin, Anne Parker, Bethan Pritchard, Steve Trevanion (VEGA) Kerstin Howe, Britt Reimholz, James Torrance Dan Lawson, Martin Hammond, Karyn Megy Xosé M Fernández, Bert Overduin, Michael Schuster (QC), Giulietta Spudich Guy Coates, Tim Cutts, Shelley Goddard Ian Dunham, Damian Keefe, Alison Meynert, Dace Ruklisa, Guy Slater, Daniel Zerbino Ewan Birney, Richard Durbin, Tim Hubbard 21 of 21

22 Cambridge Sean T. McHugh ( 22 of 21

23 Training... Somewhere near you 23 of 21