FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

Similar documents
Finding Genes, Building Search Strategies and Visiting a Gene Page

Finding Genes, Building Search Strategies and Visiting a Gene Page

Hands-On Four Investigating Inherited Diseases

Browser Exercises - I. Alignments and Comparative genomics

Investigating Inherited Diseases

FUNCTIONAL BIOINFORMATICS

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University

Exploring Proteomics Data (Exercise 12)

Interpreting RNA-seq data (Browser Exercise II)

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet

Genetic Exercises. SNPs and Population Genetics

COMPUTER RESOURCES II:

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Mass Spectrometry at EuPathDB

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

user s guide Question 1

Functional Genomics I Proteomics

BME 110 Midterm Examination

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Annotation of a Drosophila Gene

Genetic Exercises SNPs and Population Genetics

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

MODULE 5: TRANSLATION

Genetic Exercises. SNPs and Population Genetics

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

Draft 3 Annotation of DGA06H06, Contig 1 Jeannette Wong Bio4342W 27 April 2009

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

Interpreting RNA- seq data (beta)

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009)

Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G

A tutorial introduction into the MIPS PlantsDB barley&wheat database instances

Why study sequence similarity?

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

Chimp Sequence Annotation: Region 2_3

Sequence Based Function Annotation

Exercise I, Sequence Analysis

Small Exon Finder User Guide

HC70AL Spring An Introduction to Bioinformatics -- Part I. Brandon Le. April 6, What is a Gene? An ordered sequence of nucleotides

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

HC70AL Spring 2011! An Introduction to Bioinformatics! By!! Brandon Le! April 7, 2011!

What is a Gene? HC70AL Spring An Introduction to Bioinformatics -- Part I. What are the 4 Nucleotides By in DNA?

BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1

Exploring a fatal outbreak of Escherichia coli using PATRIC

Population data, SNPs and alleles Exercise 4

Basic Bioinformatics: Homology, Sequence Alignment,

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018

A tutorial introduction into the MIPS PlantsDB barley&wheat databases. Manuel Spannagl&Kai Bader transplant user training Poznan June 2013

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

Introduction to DNA-Sequencing

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Laboratory 3: Detecting selection

Using the Genome Browser: A Practical Guide. Travis Saari

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Finding and Exporting Data. Search

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

WHO/TDR Bioinformatics Workshop at ICGEB, New Delhi (2005) Index. Page Index. 1 Module 1: Artemis Prokaryotic. 2 Exercise 1 3

ab initio and Evidence-Based Gene Finding

Aaditya Khatri. Abstract

CS313 Exercise 1 Cover Page Fall 2017

Transcription Start Sites Project Report

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Why learn sequence database searching? Searching Molecular Databases with BLAST

A Prac'cal Guide to NCBI BLAST

2. The dropdown box has a number of databases that are searchable. Select the gene option and search for dihydrofolate reductase.

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Match the Hash Scores

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology

Browsing Genomes with Ensembl Genomes

Analyzing an individual sequence in the Sequence Editor

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources

Guided tour to Ensembl

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Gene Identification in silico

How to view Results with Scaffold. Proteomics Shared Resource

Homework 4. Due in class, Wednesday, November 10, 2004


Overview of the next two hours...

user s guide Question 3

User s Manual Version 1.0

BIO 152 Principles of Biology III: Molecules & Cells Acquiring information from NCBI (PubMed/Bookshelf/OMIM)

Sequence Annotation & Designing Gene-specific qpcr Primers (computational)

3'A C G A C C A G T A A A 5'

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

Designing TaqMan MGB Probe and Primer Sets for Gene Expression Using Primer Express Software Version 2.0

Training materials.

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

OncoMD User Manual Version 2.6. OncoMD: Cancer Analytics Platform

MODULE TSS1: TRANSCRIPTION START SITES INTRODUCTION (BASIC)

(a) (3 points) Which of these plants (use number) show e/e pattern? Which show E/E Pattern and which showed heterozygous e/e pattern?

Gene Annotation Project. Group 1. Tyler Tiede Yanzhu Ji Jenae Skelton

Protein Bioinformatics Part I: Access to information

Worksheet for Bioinformatics

Identifying Regulatory Regions using Multiple Sequence Alignments

Access to genes and genomes with. Ensembl. Worked Example & Exercises

Gene-centered databases and Genome Browsers

Transcription:

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) 1.1 Finding a gene using text search. Note: For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium. Hint: use the keyword kinase (without quotations) in the Gene Text Search box. - How many genes did you get? - How many of those are in P. falciparum? How did you find this out? - What happens if you search using the word kinases? How many results did you return? b. How can you increase the number of possible kinases in your results? Hint: the search you did in a will miss things like 6-phosphofructokinase or kinases so you need to use a wild card in your search try kinase*, *kinase and *kinase* (without quotations). - Did you get more results? - Which one of the above wild card combinations gave you the largest number of kinases? - How can you quickly examine the genes that were identified using the key word *kinase* but not with the word kinase? Hint: You can easily do this by combining search strategies. Click on Add Step then select existing strategy :

- Select the right strategy from your list of Gene Strategies and combine the strategies with the correct operation: Which operation did you choose? - Do the results make sense? Do all the product names contain the word kinase? c. Find only the kinases that specifically have the word kinase in the gene product name. Hint: Use the text search page, the specific page where you can define the fields to be. There are many ways to navigate to the Text Search page. - How did you get there? - How many kinases have the word kinase in their product names? - Did you remember to use the wild card?

1.2 Combing text search results with results from other searches a. In exercise 1.1 you identified genes that have the word kinase somewhere in their product name. Can you now find out how many of these kinases are likely secreted? Hint: grow your search strategy by adding a step. Choose a search that identifies genes with likely secretory signal peptides. How did you combine the search results?

Which operation did you choose? b. Now that you have a list of possible secreted kinases, how would you expand this strategy even further? Hint: there is no wrong answer here. - From a biological standpoint what else would be interesting to know about these kinases? Add more searches to grow this strategy. - For example, how many of these secreted kinases also have transmembrane domains? c. In the above example, how can you define kinases that have either a secretory signal peptide AND/OR a transmembrane domain(s)? Hint: to do this properly you will have to employ the Nested Strategy feature. Why?

Notice the different results obtained in figures A (with nesting) and B (without nesting) below: A B 1.3 Visiting a specific gene page. Note: For this exercise use http://www.plasmodb.org a. Find the bifunctional dihydrofolate reductase-thymidylate synthase (DHFR-TS) gene (or, if you prefer, the apical membrane antigen 1 gene; AMA1) in P. falciparum. - How did you navigate to this gene? What other ways could you get there? (hint: what about using the gene ID? PF3D7_0417200 for DHFR-TS or PF3D7_1133400 for AMA1)

- What chromosome is this gene on? - How many exons does this gene have? - How many nucleotides of coding sequence? b. What genes are located upstream & downstream of DHFR-TS (AMA1) in P. falciparum? - Is synteny (chromosome organization) in this region maintained in other species? (hint: look in the genomic context section of the gene page). - How complete is the genome assembly for other species? (hint: in the genomic context section, each genome is displayed as genes and the underlying contig. Are all the contigs complete?) c. Does the P. falciparum DHFR-TS (or AMA1) gene contain any single Nucleotide Polymorphisms (SNPs)? (hint: SNPs are represented graphically in the genomic context section and also in a table called SNP Overview ). - What is the total number of SNPs in the gene? - What do the different color diamonds represent? (hint: mouse over the diamonds to get more information). - How many impact the predicted protein sequence? - What is the maximum number of SNPs per strain? - Is this likely to define the full spectrum of sequence variation in these particular strains? - How do these results compare with SNP distribution in other genes? (hint: look at SNPs in upstream and downstream genes). d. Is the DHFR-TS (or AMA1) gene expressed? Hint: look at the gene page sections entitled Protein and Expression you may have to click on the show link to reveal the underlying data). - What kinds of data in PlasmoDB provide evidence for expression? - At what life cycle stage is DHFR-TS (AMA1) most abundant? - Does this make sense? - How do the different life cycle microarray expression profiles compare to each other? - Are the results similar? What about RNA-sequence data, does it agree with microarray data? - How abundant is DHFR-TS (AMA1) protein? How confident are you of this analysis?

1.4 Finding a gene by BLAST. Note: For this exercise use http://www.toxodb.org Also, you need to open another window in your browser to retrieve the sequence we will use in our BLAST search, got to: http://goo.gl/6bgjz - Imagine that you generated an insertion mutant in Toxoplasma that is providing you with some of the most interesting results in your career! You sequence the flanking region and you are only able to get sequence from one side of the insertion (the sequence in the link above). You immediately go to ToxoDB to find any information about this sequence. What do you do? - Try running a BLAST search with this sequence (hint: you can get to the BLAST tool by clicking on the BLAST link under tools on the home page). - Which blast program should you use? (hint: try different combinations, just keep in mind that you have a nucleotide sequence so you have to use an appropriate BLAST program). Note on BLAST programs: blastp compares an amino acid sequence against a protein sequence database; blastn compares a nucleotide sequence against a nucleotide sequence database; blastx compares the six-frame conceptual translation products of a nucleotide sequence (both strands) against a protein sequence database;

tblastn compares a protein sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands); tblastx compares the six-frame translations of a nucleotide sequence against the six-frame translations of a nucleotide sequence database. 1. Choose your target data type what do you want to blast against? 2. Choose which BLAST program to use. 3. Choose your target organism.

- Are you getting any results from blastx? tblastn? What about blastn? - What is your gene? (hint: after running a blastn against Toxoplasma genomic sequence, click on the link to the genome browser. In the genome browser zoom out to see what gene is in the area). 1.5 More BLASTing in EuPathDB (optional). Note: for this exercise use http://www.eupathdb.org a. The first thing we will need to do is get a sequence to use for BLAST. Search for the keyword "dihydrofolate" (without quotations). (Hint: use the Gene Text Search on the upper right hand side of the EuPathDB home page). b. You should get multiple hits. Find the first one that is annotated as "dihydrofolate reductase-thymidylate synthase" (Look in the product description column). c. Once you find one, click on the gene and go to the gene page. It might be helpful to open the gene page in a new window or tab. d. Scroll down to the bottom of the page to the Sequences section. e. Copy the amino acid sequence and go back to EuPathDB (if you have not done so already, it might be helpful to open EuPathDB in a new window or tab). f. Go to the BLAST page from the EuPathDB home page. (hint: under Tools on the EuPathDB home page). g. Paste the amino acid sequence into the input window. h. Select target data type (start with "Proteins"). i. Select BLAST Program. (Hint: BLASTP). j. Select the target organism (let's start by selecting all, but you may need to exclude Amoebazoa). Click on "Get Answer". Based on the results you should have identified excellent hits in almost all pathogens in EuPathDB but can you find good hits in Giardia, Entamoeba or Trichomonas? Let's try a different BLAST method: k. Go back to the BLAST window. Change the target data type to Genome. l. Select the BLAST Program. Notice you cannot select BLASTP anymore. Try the other options. Notice how your input sequence type has to change when you select a different program. (Hint: TBLASTN is the one you need). m. Select all target organisms. Click on "Get Answer". Note that the results are still missing a dihydrofolate from Giardia and Trichomonas and Entamoeba. Let's try a different BLAST method. n. Go to your gene page window (in CryptoDB) and copy the nucleotide coding sequence.

o. Go to the BLAST window and paste the nucleotide sequence into the input window. p. Select the target data type (try different ones) and the BLAST program. Notice you can only select TBLASTX or BLASTN when your input sequence is nucleotide. (Hint: select TBLASTX). q. Select the target organisms. This time let's specifically only select Giardia, Entamoeba and Trichomonas. Click on "Get Answer". Getting frustrated? Not getting a hit for Giardia, Entamoeba and Trichomonas in this case is actually the correct answer! These organisms do not have dihydrofolate reductase or thymidylate synthetase activity.