Guided tour to Ensembl

Size: px
Start display at page:

Download "Guided tour to Ensembl"

Transcription

1 Guided tour to Ensembl

2 Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser 7/29/2010 Seite 2

3 Ensembl Ensembl is one of the world's primary resources for genomic research. It provides many levels of access with a high degree of flexibility: provide access to the human genome as well as the genomes of other model organisms. BLAST searches against chromosomal DNA, download a genomic, genic, contig sequences search for all members of a given protein family. positions of SNPs in a gene you are working on and possible consequences. Ensembl is also an all-round software and database system (SQL, Perl ) that can be installed locally to enabling complex data mining of the genome or large-scale sequence annotation. Ensembl APIs (Perl, Java,Phyton) Biomart (perl, access direct to the SQL version of EnsemblDB) 7/29/2010 Seite 3

4 Other Genome Browsers available Ensembl Genome browser NCBI Map Viewer UCSC Genome Browser 7/29/2010 Seite 4

5 Differences between Ensembl and the UCSC and NCBI Browsers? The gene set. Automatic annotation based on mrna and protein information. Comparative analysis (gene trees) Integration with other databases (DAS) BioMart (Data-mining tool) Programmatic access via the Perl API (open source) 7/29/2010 Seite 5

6 Genomes available in Ensembl 50 species most of them vertebrates Non-chordates: D. melanogaster C. elegans S. cerevisiae 7/29/2010 Seite 6

7 Genomes available at Ensembl Vertebrates focus: Other species: 7/29/2010 Seite 7

8 Ensembl annotation pipeline All Ensembl gene predictions are based on experimental evidence such as Proteins and cdna/mrna sequences UniProt/Swiss-Prot (manually curated) NCBI RefSeq (partially manually curated) UniProt/TrEMBL (automatically annotated) Untranslated regions (UTRs) are annotated to the extent supported by EMBL mrna records. There is no guarantee that UTR sequences has enough biological evidence to predict complete UTR regions. 7/29/2010 Seite 8

9 Integrating known information Exon Exon Exon Untranslated+Coding Coding Untranslated Genome Aligned cdna and protein 7/29/2010 Seite 9. too much

10 Ensembl shows one transcript with underlying evidence 7/29/2010 Seite 10

11 VEGA/Havana Automatic annotation pipeline: Gene building all at once (whole genome) Ensembl Manual curation: case-by-case basis VEGA: Vertebrate Genome Annotation Havana 7/29/2010 Seite 11

12 Genes and Transcripts types in Ensembl known transcripts Ensembl novel transcripts Ensembl Ensembl merged transcripts (Havana/Ensembl) EST clusters More manual curation (SGD, WormBase, FlyBase) 7/29/2010 Seite 12

13 What is telling me an ID in Ensembl ENSG### ENST### ENSP### ENSE### Ensembl Gene ID Ensembl Transcript ID Ensembl Peptide ID Ensembl Exon ID For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc. 7/29/2010 Seite 13

14 Which annotations are available in Ensembl? Non-coding (nc)rnas External References from other databases microarray probes, clonesets, BAC maps Other features of the genome: repeats, CpG islands Homologs and whole genome alignments: orthologues and paralogues, protein families, syntenic regions Variation data: Single Nucleotide Polymorphisms, InDels Regulatory data (a first guess at promoter and enhancer elements) Data from external sources (DAS) 7/29/2010 Seite 15

15 Sources of Variation in Ensembl NCBI dbsnp Import: alleles, flanking sequence, frequencies, Calculate: position, transcript effect For human also: HGVbase Affy GeneChip 100K and 500K Mapping Array Affy Genome-Wide SNP array 6.0 Ensembl-called SNPs (from Celera reads and Jim Watson s and Craig Venter s genomes) For mouse, rat, dog and chicken also: Sanger- and Ensembl-called SNPs (other strains / breeds) STAR Project for rat, other projects 7/29/2010 Seite 16

16 Protein Signatures: InterPro Uses classifications from multiple protein signature databases. Integrates these signatures, descriptions and definitions into one database (InterPro) 7/29/2010 Seite 17

17 How is all this information organised? Ensembl Database (open source) Ensembl Views (Website) BioMart DataMining tool 7/29/2010 Seite 18

18 Walking tour through the website Working example: the human rhodopsin (RHO) gene The following points will be addressed: The Gene Summary tab and gene-related links: Are there splice variants? Can I view the genomic sequence with variations? Find orthologues and paralogues The Transcript tab and related links: What is the protein sequence? What matching proteins and mrnas are found in other databases? Gene Ontology The Location tab and related links: What s the conservation track? How do I zoom in and change the gene focus. Un-stacking a track (e.g. human cdnas) Exporting a sequence and running BLAT/BLAST 7/29/2010 Seite 19

19 Walking tour through the website Start by going to 7/29/2010 Seite 20

20 Searching with a gene name Type gene RHO into the search bar circled above and click the Go button. 7/29/2010 Seite 21

21 Finding what I am looking for Click in the first hit 7/29/2010 Seite 22

22 Location Tab Rho neighbouring genes Rho transcripts Ensembl Location displays are highly configurable. Use the Configure link to: add Sequence variants (all sources) to the display choose to view the Human UniProt prot. track in normal, expanded form Click on the Multiple alignments menu, and choose the three tracks for the 33 eutherian mammals (including Conservation score and Constrained elements ). 7/29/2010 Seite 23

23 What are the links in the navigation column? Select Gene Tab How can we view the genomic sequence? 7/29/2010 Seite 24

24 How to define what do we want to see How can we configure the view Display variations Number of lines 7/29/2010 Seite 25

25 How to export a sequence After investigating the Location/Gene display, we would like to export genomic sequence. Click the Export data option and click Next 7/29/2010 Seite 26

26 Comparative Genomics: Genomic alignments to see a nucleotide view of the whole genome alignments. Select the 12 eutherian mammals, EPO. How can we view existingthe EPO pipeline refers to the programs genomic alignments? behind the whole genome alignments identical nucleotides are highlighted in blue. 7/29/2010 Seite 27

27 Types of Homologues Orthologues : any gene pairwise relation where the ancestor node is a speciation event Paralogues : any gene pairwise relation where the ancestor node is a duplication event 7/29/2010 Seite 28

28 Viewing Trees in Ensembl Now let s click on Gene tree (image), which will display the current gene in the context of a phylogenetic tree used determine orthologues and paralogues Click the Orthologues link to view homologues detected in this tree Click on any node (square) to reveal the taxonomic level, or to collapse or expand a subtree.. 7/29/2010 Seite 29

29 Orthologue Types in Table What is 1 to 1? one-to-one orthologs: in both species, there is only one corresponding ortholog. What is 1 to many? one-to-many or many-to-many orthologs: in at least one of the two species, the gene duplicated after speciation. 7/29/2010 Seite 30

30 How to interpret the Trees A blue square is a speciation event (Orthologues) A red square is a duplication event (Paralogues) 7/29/2010 Seite 31

31 Gene Tab: Transcript information for a gene Go back to the Gene tab. Now, let s focus more closely on a transcript (spliced mrna). Select the longer transcript from the table (ENST ). This will lead to the Transcript summary display. Again, the left hand navigation column provides several options for this particular transcript. 7/29/2010 Seite 32

32 Visualizing Exons for a transcript Choose the Exons option first, which highlights exon sequences. (exons, introns and flanking sequence are shown). flanking sequences (green) UTRs (purple) You may use the Configure link to change the display: to show more flanking sequence, or to show full introns Introns (blue) Coding Sequence (CDS) (black) 7/29/2010 Seite 33

33 Obtaining external identifiers for a transcript General identifiers Other views in the External References menu include microarray probes and gene ontology terms from the GO consortium 7/29/2010 Seite 34

34 Protein summary: mapped domains and signatures Ensembl protein Signatures mapped to the sequence Clicking on Domains & features shows a table of protein signatures. 7/29/2010 Seite 35

35 Genomic Variation: SNPs Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of the population Most polymorphisms (~90%) take the form of SNPs: variations that involve just one nucleotide SNPs (Single Nucleotide Polymorphisms) base pair substitutions InDels insertion/deletion (frameshifts) occur in 1 in every 300 bp (human) ~3 billion base pairs in mammalian genomes! 7/29/2010 Seite 36

36 Genomic Variation: Functional types of SNPs Type Description Consequence non synonymous SNP synonymous SNP SNPs in coding area that alter aa sequence SNPs in coding areas that don t alter aa sequence Cause of most monogenic disorders, e.g: Cystic fibrosis (CFTR) Hemophilia (F8) May affect splicing regulatory SNP SNPs in promoter or regulatory regions May affect the level, location or timing of gene expression SNPs in other regions Useful as markers No direct known impact on phenotype 7/29/2010 Seite 37

37 SNPs in Ensembl: types Non-synonymous Synonymous Frameshift Stop lost Stop gained Essential splice site Splice site In coding sequence, resulting in an aa change In coding sequence, not resulting in an aa change In coding sequence, resulting in a frameshift In coding sequence, resulting in the loss of a stop codon In coding sequence, resulting in the gain of a stop codon In the first 2 or the last 2 basepairs of an intron 1-3 bps into an exon or 3-8 bps into an intron Upstream Within 5 kb upstream of the 5'-end of a transcript Regulatory region In regulatory region annotated by Ensembl 5' UTR In 5' UTR Intronic In intron 3' UTR In 3' UTR Downstream Within 5 kb downstream of the 3'-end of a transcript Intergenic More than 5 kb away from a transcript 7/29/ of 25Seite 38

38 SNPs information in Ensembl Most SNPs imported from dbsnp (rs ): Imported data: alleles, flanking sequences, frequencies, Calculated data: position, synonymous status, peptide shift,. For human also: HGMD HGVS (Human Genome Variation Society) Affy GeneChip 100K and 500K Mapping Array Affy Genome-Wide SNP array 6.0 Ensembl-called SNPs (from Celera reads and Jim Watson s and Craig Venter s genomes) Organisms with SNP information Human Chimp Mouse Rat Dog Cow Platypus Chicken Zebrafish Tetraodon Mosquito 7/29/2010 Seite 39

39 Variation Table View a table of variations for each transcript Gene tab 7/29/2010 Seite 40

40 Variation Table: Variation Image View variations drawn along a transcript: Variation Image Human, Mouse, Rat, Dog and Cow have individual or strain comparisons: 7/29/2010 Seite 41

41 SNP Effect Calculator Click on Manage your data at the left of any page. Follow the link to SNP Effect Predictor. Paste in variation positions and alleles 7/29/2010 Seite 42

42 SNP Effect Calculator Location, variation name in Ensembl, and consequence on amino acid sequence is returned. 7/29/2010 Seite 43

43 Data Mining with BioMart

44 BioMart- Data Mining BioMart is a search engine that can find multiple terms and put them into a table format. Allows the high-throughput search such as: mouse gene (IDs), chromosome and base pair position No programming required! 7/29/2010 Seite 45

45 What kind of tables can I generate? Specific and very General, both: All the genes for one species Or only genes on one specific region of a chromosome Or genes on one region of a chromosome associated with an InterPro domain 7/29/2010 Seite 46

46 The first Step: Choose the Dataset Choose Ensembl Genes 58 Choose Human current dataset 7/29/2010 Seite 47

47 Second Step: select the filters Narrow the gene set by clicking Filters on the left. Click on the + in front of REGION to expand the choices. 7/29/2010 Seite 48

48 Second Step: select the filters Select Chromosome X Select Band Start q28 and End q28 Expand the gene panel 7/29/2010 Seite 49

49 Second Step: select the filters Limit to genes with CCDS ID(s). Consensus Coding Sequences are assigned when all genome annotation groups agree on a model. Click Count to see how many genes have passed these filters. 7/29/2010 Seite 50

50 Selecting Output Columns: Type of Output Select Feaures and expand the GENE pannel. 7/29/2010 Seite 51

51 Selecting Output Columns: Attributes Select type of output from the 6 different possibilities (Features) Summary of selected columns Select, along with the default options, Associated Gene name (this shows the gene symbol from HGNC). Add as many attributes as desired 7/29/2010 Seite 52

52 Selecting Output Columns: Attributes Expand the EXTERNAL panel to select External References. Select type of output from the 6 different possibilities (Features) 7/29/2010 Seite 53

53 Results Click Results to preview the output Decide how to do you want to export them To save a file of the complete table, click Go. Or, the results to any address. Go back and change Filters or Attributes if desired. Or, View ALL rows as HTML 7/29/2010 Seite 54

54 Selecting Output Columns: Type of Output Select Sequences and expand the SEQUENCE panel. Select which sequences do you want, CDS, entire gene, exons, promoter regions, UTRs 7/29/2010 Seite 55 Expand HEADER to customise the FastA header for your sequences

55 Results >Header: Gene ID, Transcript ID, Gene Name, Chromosome 7/29/2010 Seite 56

56 Other Export Options (Attributes) Sequences: UTRs, flanking sequences, cdna and peptides, etc Gene IDs from Ensembl and external sources (MGI, Entrez, etc) Microarray data Protein Functions/descriptions (Interpro, GO) Orthologous gene sets SNP/ Variation Data 7/29/2010 Seite 57

57 Mapping a sequence to the genome: BLAST,BLAT Click on the BLAST/BLAT link in the bar at the top of the page Paste the sequence into the appropriate box and select BLAT as the search algorithm Click Run 7/29/2010 Seite 58

58 Mapping a sequence to the genome: BLAST,BLAT an alignment [A], the query sequence [S], the genome sequence [G] Location View [C Clicking on [C] should reveal the BLAT hit in Region in detail 7/29/2010 Seite 59

59 Mapping a sequence to the genome: BLAST,BLAT 7/29/2010 Seite 60

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Data mining in Ensembl with BioMart Worked Example The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Which other genes related to human

More information

Training materials.

Training materials. Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

Access to genes and genomes with. Ensembl. Worked Example & Exercises

Access to genes and genomes with. Ensembl. Worked Example & Exercises Access to genes and genomes with Ensembl Worked Example & Exercises September 2006 1 CONTENTS WORKED EXAMPLE... 2 BROWSING ENSEMBL... 21 Exercises... 21 Answers... 22 BIOMART... 25 Exercises... 25 Answers...

More information

user s guide Question 1

user s guide Question 1 Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966

More information

ab initio and Evidence-Based Gene Finding

ab initio and Evidence-Based Gene Finding ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene

More information

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu.   handouts, papers, datasets Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger

More information

Training materials.

Training materials. Training materials - Ensembl training materials are protected by a CC BY license - http://creativecommons.org/licenses/by/4.0/ - If you wish to re-use these materials, please credit Ensembl for their creation

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

Applied Bioinformatics

Applied Bioinformatics Applied Bioinformatics Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Course overview What is bioinformatics Data driven science: the creation and advancement

More information

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html

More information

Overview of the next two hours...

Overview of the next two hours... Overview of the next two hours... Before tea Session 1, Browser: Introduction Ensembl Plants and plant variation data Hands-on Variation in the Ensembl browser Displaying your data in Ensembl After tea

More information

Ensembl: A New View of Genome Browsing

Ensembl: A New View of Genome Browsing 28 TECHNICAL NOTES EMBnet.news 15.3 Ensembl: A New View of Genome Browsing Giulietta M. Spudich and Xosé M. Fernández- Suárez European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxon, Cambs,

More information

Browsing Genes and Genomes with Ensembl

Browsing Genes and Genomes with Ensembl Browsing Genes and Genomes with Ensembl Emily Perry Ensembl Outreach Project Leader EMBL-EBI Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website.

More information

Browsing Genes and Genomes with Ensembl

Browsing Genes and Genomes with Ensembl Training materials - - - - Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their

More information

Browsing Genomes with Ensembl

Browsing Genomes with Ensembl Browsing Genomes with Ensembl www.ensembl.org www.ensemblgenomes.org Exercise Answers v73 http://tinyurl.com/washu171013 Washington University 17 th October 2013 1 TABLE OF CONTENTS Exploring the Ensembl

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

Genomes with Ensembl. Dr. Giulietta Spudich CNIO, of 21

Genomes with Ensembl. Dr. Giulietta Spudich CNIO, of 21 Genomes with Ensembl Dr. Giulietta Spudich CNIO, 2008 1 of 21 Overview of the day Introduction and website walk-through Hands on exercises (the browser) Introduction to BioMart (data mining) Variations

More information

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically

More information

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010 Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010 Genomics is a new and expanding field with an increasing impact

More information

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen The tutorial is designed to take you through the steps necessary to access SNP data from the primary database resources:

More information

Browser Exercises - I. Alignments and Comparative genomics

Browser Exercises - I. Alignments and Comparative genomics Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Ensembl and ENA. High level overview and use cases. Denise Carvalho-Silva. Ensembl Outreach Team

Ensembl and ENA. High level overview and use cases. Denise Carvalho-Silva. Ensembl Outreach Team Ensembl and ENA High level overview and use cases Denise Carvalho-Silva Ensembl Outreach Team On behalf of Ensembl and ENA teams European Molecular Biology Laboratories Euroepan Bioinformatics Institute

More information

The University of California, Santa Cruz (UCSC) Genome Browser

The University of California, Santa Cruz (UCSC) Genome Browser The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,

More information

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University Annotation Walkthrough Workshop NAME: BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University A Simple Annotation Exercise Adapted from: Alexis Nagengast,

More information

Access to genes and genomes with. Ensembl. Introduction and Worked Example

Access to genes and genomes with. Ensembl. Introduction and Worked Example Access to genes and genomes with Ensembl Introduction and Worked Example February 2007 1 CONTENTS INTRODUCTION...2 WORKED EXAMPLE...7 Exercises...26 Answers...27 BIOMART Exercises 30 Answers..32 COMPARATIVE

More information

Hands-On Four Investigating Inherited Diseases

Hands-On Four Investigating Inherited Diseases Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise

More information

Investigating Inherited Diseases

Investigating Inherited Diseases Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise to inherited diseases.

More information

Applied Bioinformatics Exercise Learning to know a new protein and working with sequences

Applied Bioinformatics Exercise Learning to know a new protein and working with sequences Applied Bioinformatics Exercise Learning to know a new protein and working with sequences In this exercise we will explore some databases and tools that can be used to get more insight into a new protein

More information

Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009)

Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009) Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009) Prerequisites: BLAST Exercise: An In-Depth Introduction to NCBI BLAST Familiarity

More information

Annotation of a Drosophila Gene

Annotation of a Drosophila Gene Annotation of a Drosophila Gene Wilson Leung Last Update: 12/30/2018 Prerequisites Lecture: Annotation of Drosophila Lecture: RNA-Seq Primer BLAST Walkthrough: An Introduction to NCBI BLAST Resources FlyBase:

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

BME 110 Midterm Examination

BME 110 Midterm Examination BME 110 Midterm Examination May 10, 2011 Name: (please print) Directions: Please circle one answer for each question, unless the question specifies "circle all correct answers". You can use any resource

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can

More information

Genomics: Genome Browsing & Annota3on

Genomics: Genome Browsing & Annota3on Genomics: Genome Browsing & Annota3on Lecture 4 of 4 Introduc/on to BioMart Dr Colleen J. Saunders, PhD South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development,

More information

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016 Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

Array-Ready Oligo Set for the Rat Genome Version 3.0

Array-Ready Oligo Set for the Rat Genome Version 3.0 Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.

More information

Aaditya Khatri. Abstract

Aaditya Khatri. Abstract Abstract In this project, Chimp-chunk 2-7 was annotated. Chimp-chunk 2-7 is an 80 kb region on chromosome 5 of the chimpanzee genome. Analysis with the Mapviewer function using the NCBI non-redundant database

More information

Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL

Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL PIR-PSD Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database o A high quality

More information

Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M.

Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M. Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M. Brent Prerequisites: A Simple Introduction to NCBI BLAST Resources: The GENSCAN

More information

INTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet

INTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet INTRODUCTION TO BIOINFORMATICS SAINTS GENETICS 12-120522 - Ian Bosdet (ibosdet@bccancer.bc.ca) Bioinformatics bioinformatics is: the application of computational techniques to the fields of biology and

More information

Briefly, this exercise can be summarised by the follow flowchart:

Briefly, this exercise can be summarised by the follow flowchart: Workshop exercise Data integration and analysis In this exercise, we would like to work out which GWAS (genome-wide association study) SNP associated with schizophrenia is most likely to be functional.

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Chimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R.

Chimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R. Chimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R. Brent Prerequisites: BLAST exercise: Detecting and Interpreting Genetic

More information

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database

More information

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017 Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018 Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l

More information

In silico variant analysis: Challenges and Pitfalls

In silico variant analysis: Challenges and Pitfalls In silico variant analysis: Challenges and Pitfalls Fiona Cunningham Variation annotation coordinator EMBL-EBI www.ensembl.org Sequencing -> Variants -> Interpretation Structural variants SNP? In-dels

More information

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September

More information

Biotechnology Explorer

Biotechnology Explorer Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual

More information

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES. MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES. Table of Contents Examples 1 Sample Analyses 5 Examples: Introduction to Examples While these examples can be followed

More information

Sequence Alignments. Week 3

Sequence Alignments. Week 3 Sequence Alignments Week 3 Independent Project Gene Due: 9/25 (Monday--must be submitted by email) Rough Draft Due: 11/13 (hard copy due at the beginning of class, and emailed to me) Final Version Due:

More information

Chapter 2: Access to Information

Chapter 2: Access to Information Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI

More information

Transcription Start Sites Project Report

Transcription Start Sites Project Report Transcription Start Sites Project Report Student name: Student email: Faculty advisor: College/university: Project details Project name: Project species: Date of submission: Number of genes in project:

More information

MODULE 5: TRANSLATION

MODULE 5: TRANSLATION MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base

More information

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

TIGR THE INSTITUTE FOR GENOMIC RESEARCH Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,

More information

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

Browsing Genomes with Ensembl

Browsing Genomes with Ensembl Browsing Genomes with Ensembl www.ensembl.org www.ensemblgenomes.org Exercise Answers v73 http://tinyurl.com/washu181013 Washington University 18 th October 2013 1 TABLE OF CONTENTS Exploring the Ensembl

More information

Niemann-Pick Type C Disease Gene Variation Database ( )

Niemann-Pick Type C Disease Gene Variation Database (   ) NPC-db (vs. 1.1) User Manual An introduction to the Niemann-Pick Type C Disease Gene Variation Database ( http://npc.fzk.de ) curated 2007/2008 by Dirk Dolle and Heiko Runz, Institute of Human Genetics,

More information

Access to genes and genomes with. Ensembl. Introduction and Worked Example. Dec

Access to genes and genomes with. Ensembl. Introduction and Worked Example. Dec Access to genes and genomes with Ensembl Introduction and Worked Example Dec 2007-1 - CONTENTS I) INTRODUCTION...2 II) BROWSING ENSEMBL Worked example 7 III) BROWSING ENSEMBL Exercises...27 Answers. 28

More information

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options

More information

Using the Genome Browser: A Practical Guide. Travis Saari

Using the Genome Browser: A Practical Guide. Travis Saari Using the Genome Browser: A Practical Guide Travis Saari What is it for? Problem: Bioinformatics programs produce an overwhelming amount of data Difficult to understand anything from the raw data Data

More information

CITATION FILE CONTENT / FORMAT

CITATION FILE CONTENT / FORMAT CITATION 1) For any resultant publications using single samples please cite: Matthew A. Field, Vicky Cho, T. Daniel Andrews, and Chris C. Goodnow (2015). "Reliably detecting clinically important variants

More information

Chapter 5. Structural Genomics

Chapter 5. Structural Genomics Chapter 5. Structural Genomics Contents 5. Structural Genomics 5.1. DNA Sequencing Strategies 5.1.1. Map-based Strategies 5.1.2. Whole Genome Shotgun Sequencing 5.2. Genome Annotation 5.2.1. Using Bioinformatic

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:138/nature10532 a Human b Platypus Density 0.0 0.2 0.4 0.6 0.8 Ensembl protein coding Ensembl lincrna New exons (protein coding) Intergenic multi exonic loci Density 0.0 0.1 0.2 0.3 0.4 0.5 0 5 10

More information

Gene-centered databases and Genome Browsers

Gene-centered databases and Genome Browsers COURSE OF BIOINFORMATICS a.a. 2015-2016 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about

More information

Gene-centered databases and Genome Browsers

Gene-centered databases and Genome Browsers COURSE OF BIOINFORMATICS a.a. 2016-2017 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about

More information

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types Lecture 12 Reading Lecture 12: p. 335-338, 346-353 Lecture 13: p. 358-371 Genomics Definition Species sequencing ESTs Mapping Why? Types of mapping Markers p.335-338 & 346-353 Types 222 omics Interpreting

More information

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7

More information

Supplementary Information for:

Supplementary Information for: Supplementary Information for: BISQUE: locus- and variant-specific conversion of genomic, transcriptomic, and proteomic database identifiers Michael J. Meyer 1,2,3,, Philip Geske 1,2,, and Haiyuan Yu 1,2,*

More information

Browsing Genomes with Ensembl Genomes

Browsing Genomes with Ensembl Genomes Browsing Genomes with Ensembl Genomes www.ensemblgenomes.org Coursebook http://www.ebi.ac.uk/~blaise/beca BECA- ILRI 16 th October 2013 Chat room: http://tinyurl.com/ensembl-nairobi TABLE OF CONTENTS Introduction

More information

Figure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By

Figure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By 1 2 3 Figure 1. FasterD SERCH PGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- y keywords (ENSEML ID, HUGO gene name, synonyms or

More information

Gene-centered resources at NCBI

Gene-centered resources at NCBI COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving

More information

Genome annotation & EST

Genome annotation & EST Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary

More information

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org

More information

BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology

BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology Jeremy Buhler March 15, 2004 In this lab, we ll annotate an interesting piece of the D. melanogaster genome. Along the way, you ll get

More information

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background

More information

Homework 4. Due in class, Wednesday, November 10, 2004

Homework 4. Due in class, Wednesday, November 10, 2004 1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors

More information

Tutorial section. VEGA, the genome browser with a difference

Tutorial section. VEGA, the genome browser with a difference VEGA, the genome browser with a difference Keywords: vertebrate, annotation, database, manual, curation Abstract The Vertebrate Genome Annotation (Vega) database is a community resource for browsing manual

More information

DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences

DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences Huiqing Liu Hao Han Jinyan Li Limsoon Wong Institute for Infocomm Research, 21 Heng Mui Keng Terrace,

More information

Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh

Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh Selective constraints on noncoding DNA of mammals Peter Keightley Institute of Evolutionary Biology University of Edinburgh Most mammalian noncoding DNA evolves rapidly Homo-Pan Divergence (%) 1.5 1.25

More information

FUNCTIONAL BIOINFORMATICS

FUNCTIONAL BIOINFORMATICS Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.

More information

Why Use BLAST? David Form - August 15,

Why Use BLAST? David Form - August 15, Wolbachia Workshop 2017 Bioinformatics BLAST Basic Local Alignment Search Tool Finding Model Organisms for Study of Disease Can yeast be used as a model organism to study cystic fibrosis? BLAST Why Use

More information

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,

More information

Mouse Genome Informatics (MGI) Workshop

Mouse Genome Informatics (MGI) Workshop Mouse Genome Informatics (MGI) Workshop Mouse Genome Informatics (MGI) provides free, public access to integrated data on the genetics, genomics and biology of the laboratory mouse. In this self-guided

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 8

More information

Consistent annotation of gene expression arrays

Consistent annotation of gene expression arrays METHODOLOGY ARTICLE Open Access Methodology article Consistent annotation of gene expression arrays Benoît Ballester, Nathan Johnson, Glenn Proctor and Paul Flicek* Abstract Background: Gene expression

More information

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS

More information

Towards Personal Genomics

Towards Personal Genomics Towards Personal Genomics Tools for Navigating the Genome of an Individual Saul A. Kravitz J. Craig Venter Institute Rockville, MD Bio-IT World 2008 Introduce yourself Relate our experience with individual

More information

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

Computational Biology and Bioinformatics

Computational Biology and Bioinformatics Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management

More information

Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G

Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Introduction: A genome is the total genetic content of

More information

Selecting TILLING mutants

Selecting TILLING mutants Selecting TILLING mutants The following document will explain how to select TILLING mutants for your gene(s) of interest. To begin, you will need the IWGSC gene model identifier for your gene(s), the IWGSC

More information

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional

More information