COMPUTER RESOURCES II:

Similar documents
Hands-On Four Investigating Inherited Diseases

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

BLAST. Subject: The result from another organism that your query was matched to.

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5

DNA is normally found in pairs, held together by hydrogen bonds between the bases

Protein Synthesis. Lab Exercise 12. Introduction. Contents. Objectives


Written by: Prof. Brian White

SAMPLE LITERATURE Please refer to included weblink for correct version.

Biotechnology Explorer

Gene-centered resources at NCBI

Transcription and Translation

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

BIO 101 : The genetic code and the central dogma

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why Use BLAST? David Form - August 15,

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

The Chemistry of Genes

Evolutionary Genetics. LV Lecture with exercises 6KP

Basic Bioinformatics: Homology, Sequence Alignment,

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein

Gene Identification in silico

Sequence Analysis Lab Protocol

Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Protein Synthesis. OpenStax College

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

Protein Synthesis. DNA to RNA to Protein

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Ch 10.4 Protein Synthesis

Computational gene finding. Devika Subramanian Comp 470

Bio 101 Sample questions: Chapter 10

From DNA to Protein: Genotype to Phenotype

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007

Activity A: Build a DNA molecule

MATH 5610, Computational Biology

Algorithms in Bioinformatics

PROTEIN SYNTHESIS. copyright cmassengale

Solutions to Quiz II

From AP investigative Laboratory Manual 1

Study Guide for Chapter 12 Exam DNA, RNA, & Protein Synthesis

Genomics and Gene Recognition Genes and Blue Genes

O C. 5 th C. 3 rd C. the national health museum

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.

Transcription. Unit: DNA. Central Dogma. 2. Transcription converts DNA into RNA. What is a gene? What is transcription? 1/7/2016

ELE4120 Bioinformatics. Tutorial 5

Lecture for Wednesday. Dr. Prince BIOL 1408

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

Pre-Lab: Molecular Biology

Chapter 13 - Concept Mapping

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

PROTEIN SYNTHESIS. copyright cmassengale

7.2 Protein Synthesis. From DNA to Protein Animation

Chapter 14 Active Reading Guide From Gene to Protein

Chimp Sequence Annotation: Region 2_3

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

Prokaryotic Transcription

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

DNA Function: Information Transmission

Chapter 12. DNA TRANSCRIPTION and TRANSLATION

AP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32.

Independent Study Guide The Blueprint of Life, from DNA to Protein (Chapter 7)

Gene Expression. Student:

Genome Sequence Assembly

If Dna Has The Instructions For Building Proteins Why Is Mrna Needed

Unit #5 - Instructions for Life: DNA. Background Image

Sequencing the Human Genome

BIOB111 - Tutorial activity for Session 13

NOTES Gene Expression ACP Biology, NNHS

user s guide Question 3

Interpretation of sequence results

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Chapter 5. explain how information is submitted to and processed by biological databases.

SSA Signal Search Analysis II

user s guide Question 3

BIOLOGY 2250 LABORATORY Genetic Resources on the Web and Analysis of Interspecific Variation in DNA Sequences

Genes and gene finding

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Chapter 14: Gene Expression: From Gene to Protein

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

RNA and Protein Synthesis

DNA RNA PROTEIN. Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted

Designing TaqMan MGB Probe and Primer Sets for Gene Expression Using Primer Express Software Version 2.0

CHAPTER 11 DNA NOTES PT. 4: PROTEIN SYNTHESIS TRANSCRIPTION & TRANSLATION

Chapter 17: From Gene to Protein

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1

Green Genes: a DNA Curriculum

Bioinformatics for Proteomics. Ann Loraine

DNA Structure & the Genome. Bio160 General Biology

Eukaryotic Gene Structure

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

Protein Synthesis: Transcription and Translation

Protein Synthesis Transcription And Translation Lab Answers

2. From the first paragraph in this section, find three ways in which RNA differs from DNA.

Lecture 2: Biology Basics Continued

Replication Review. 1. What is DNA Replication? 2. Where does DNA Replication take place in eukaryotic cells?

Transcription:

COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer lab, we discussed how using the computer is important to modern-day cell biologists. In this lab, we will use the computer to analyze a DNA sequence that will be provided to you. You will examine this DNA sequence for a protein coding region (often referred to as an open reading frame or ORF s). Once you determine the protein most likely coded for by this piece of DNA, you will take the translated protein sequence and access an online database to figure out what protein it encodes. Finally, you will use an organism-specific database to learn more about your protein. Before you begin the computer work, your TA will be giving you a short double-stranded DNA sequence and will ask you to translate, using the codon tables below (see next page), this DNA sequence into the six possible reading frames. Please complete this assignment first, hand it in to the TA, and then proceed to the next part of the lab. To begin the computer-assisted translation, get your DNA sequence from your TA. Note that this DNA sequence contains the entire Open Reading Frame (ORF) for a yeast (Saccharomyces cerevisiae) gene. There are a few important details to note: The protein encoded by the DNA sequence is greater than 80 amino acids long. The DNA sequence your TA has given you does not contain any introns. (Only 5% of S. cerevisiae genes contain introns, unlike human genes where essentially all protein-coding genes contain introns. It is simpler to do the bioinformatics of protein prediction using DNA without introns.) This sequence is a single strand of DNA. By convention, the first nucleotide is at the 5 end and the last nucleotide is at the 3 end. The reverse complement of this strand can be easily inferred from the sequence you have. Step 1: Figure out the protein sequence that your DNA sequence encodes Since we have told you the DNA sequence you have encodes a protein, you should be able to figure out what protein it encodes. You should recall from class that the DNA code is translated into protein by the use of trnas that recognize DNA as triplets.

The genetic code is provided below (from Figure 7-24 of Essential Cell Biology, ed. 3): Here is an alternative layout of the genetic code: You should also recall how the ribosome translates an mrna molecule (see Figures 7-33, 7-34, and 7-37 in Essential Cell Biology, ed. 3 for review). One way to translate your DNA sequence is to compare the sequence to the genetic code by hand and determine the proteins that could potentially be encoded by the DNA. There are also computer programs that can do this same thing. However, you should understand conceptually

how to translate DNA into protein, just like you should know how to add even though you own a calculator. A particularly good program to translate DNA into protein is JavaScript DNA Translator 1.1. This program can be found at: http://www.annular.org/~sdbrown/dna/translator.html To use this program: Copy the sequence that your TA gave you and paste it in the box labeled Sequence: DNA Only or FASTA format. Using the menus, you can choose to receive the output in either a 3-letter Amino Acid representation or a 1-letter Amino Acid representation. Choose 1-letter. Be sure to choose 6 reading frames for the Reading Frame. The Line Length choice determines how long the lines of your output file is; the default of 60 is fine. When you are setting up your translations, UNCHECK the box that says Display Translations of ORFs of at least The bottom selections of your computer screen should look like: Now, hit the button that says Translate and you will get the output on another window in your browser. (Be patient, it can take a few seconds!) Examine the output file. At the top, you will see the six-frame translation of your DNA sequence. At the bottom of the window, you will see the six translations listed separately.

1. Scroll to the bottom of the output window. You will see the six translations in six reading frames, highlighted in yellow. From these translations, determine which is the most likely ORF that encodes your protein. Hint: look for the longest possible ORF, starting with methionine and ending with a stop codon, which is indicated with an asterisk (*). Note the number of the frame that encodes this ORF. Copy this ORF, from methionine to the last amino acid, and paste it into the Word file with your gene, below the gene sequence. 2. Now look at the printout of your output, and find the DNA codon that encodes the first methionine in your ORF (methionine is encoded by AUG in the RNA which corresponds to ATG in the DNA). In the output window, amino acid letters are positioned above the first nucleotide in each codon. Circle this ATG in your printout. Now return to your Word file with your gene sequence, find that codon in the sequence, and highlight it using BOLD face and 16 point font. Note that if your gene is encoded by reading frames 4, 5, or 6 (on the bottom strand of your DNA), then you have to be looking closer to the end of your gene s sequence, and looking for the codon which is complementary to ATG. (The reason for this is that in the Word file, only one strand is provided as your gene sequence, which the translator program understands as the top strand by default.) 3. Look at your printout again, and find the DNA codon which corresponds to the translation stop signal. These codons are TAA, TAG, or TGA, and will correspond to the asterisk at the end of your ORF. Circle the stop codon in your printout, and return to the Word file with your gene. Find the stop codon there and highlight it by using underline and 16 point font. Again, remember that if your gene is in frame 4, 5, or 6, you have to be looking somewhere in the beginning of your sequence, and searching for a codon which is complementary to TAA, TAG, or TGA. Again, this is because only the top strand is given as the sequence for your gene in the Word file. 4. Print your amended sequence file. Step 2: Find out the name of the protein that your DNA encodes. Now that you have the protein sequence that your DNA encodes, we can use the databases to figure out what the name of that protein is. As you are probably aware, concerted efforts to determine the full DNA sequences of many different organisms have been undertaken (also referred to as genome projects, since these efforts are to determine the composition of a genome of a particular species). The Saccharomyces cerevisiae genome was first completed in 1996 and was the first complete eukaryotic genome to be sequenced. Since then, many other genomes have been completed (including the first draft of the human genome, completed in 2001). The DNA sequences obtained from these genome projects are available in public databases, and many programs exist that can assist you in searching these databases. Additionally, the predicted proteins encoded by these DNA sequences can be determined as well, in a similar manner as you did in Step 1 above; these predicted protein sequences have also been deposited into publicly available databases.

One of the more commonly used programs to search the DNA databases is BLAST (Basic Local Alignment Search Tool), developed by the National Center for Biotechnology Information (NCBI). BLAST can be used in different ways: to compare a DNA sequence to DNA sequences in the databases (blastn). to compare an amino acid sequence to protein sequences in the databases (blastp). to compare a DNA sequence translated in all six reading frames to all protein sequences in the databases (blastx). You can access BLAST searches against ALL publicly available databases at: http://www.ncbi.nlm.nih.gov/blast/ However, since we know that you have a yeast protein, we are instead going to use a portal specifically designed for analyzing genomic information related to S. cerevisiae. This way when you search for your protein sequence, you will not get all theoretical matches, but only those matches that are in the S. cerevisiae genome. You will also be able to more easily obtain other information about your gene. Point your browser to: http://www.yeastgenome.org/cgi bin/blast sgd.pl Paste in your translated protein sequence into the box that says Type or Paste a Query Sequence. At the first drop down menu, choose blastp as the appropriate BLAST program. In the second box with options, choose Open Reading Frames (DNA or Protein). Leave everything as the default, and push the button to Run WU- BLAST. A new page will come up giving you your BLAST search results. At the top of the page will be a graphical interface depicting the highest significant matches from the S. cerevisiae protein database. For a beginner, a conceptually more simple output can be found at the middle of the page, with results that look like:

These results above are for a blastp search using a protein sequence of 388 amino acids against the database containing the translation of all standard S. cerevisiae ORFs. You can see a list of many sequences that produce high-scoring segment pairs with your protein query sequence. What this list details is, for the first high-scoring segment above: the official ORF name in bold (YPR043W) the gene name (SMK1; this is also the name of the protein) the database ID (SGDID:S000006258) a brief description (Chr XVI from 666277-667443) the BLAST score, in arbitrary units (2053; note a larger number indicates a more significant match compared to a smaller number) the probability that your match was random (3.8e-214; the smaller the less likely it was random) You can see that the results are sorted, with the highest-scoring match being presented first. If you now click on the probability (also called E value) above in blue (today it may be pink), you skip to the area on that page which aligns the amino acid sequence matches between your query sequence and the one obtained by BLAST. Since you know that your DNA encodes a S. cerevisiae protein, your query should be an exact match with the protein identified by blastp. The other proteins on the list are proteins that have some similarity to your query, but should not be an exact match. For the query above, the best match was the SMK1 gene, encoded by ORF number YPR043w. The alignment looks like this:

4. You should write down the ORF name and gene name of the best match with the protein sequence that you queried. Put this information on a new line BELOW where you pasted the protein sequence. Step 3: Learn something about the protein you are researching. Now that you know the name of the protein, use the yeast database to learn something about its characteristics. There are many ways to go about doing this. The simplest method is as follows: Point your browser to: www.yeastgenome.org This is the home page of the Saccharomyces Genome Database (SGD), and contains a great deal of information about research involving Saccharomyces research. For example, if you were to click on the Virtual Library link on the left (under External Links ), you would pull up a page that includes other links to information about yeasts, including Yeast information for the nonspecialist. If you were to click on the BLAST link on the left (under Analysis and Tools ), you would arrive at the page you used previously to determine what the name of the gene that encoded your protein. At the top of the page is a Quick Search Box. Enter either the ORF number or the gene name and hit Submit. You should arrive at a summary page that has a lot of information about your gene. The top of the page will look something like this:

Spend some time scrolling around and clicking on the various links, including the tabs at the top of the page (i.e., Locus History, Literature, Phenotype, etc.) and the pull-down menus on the left hand side. To complete this laboratory exercise, figure out the following information about your protein using SGD, and put this information in the file you ve been working on. 5. What chromosome is this gene on? 6. Provide the reference for a scientific paper that includes information about this gene. 7. What is the predicted molecular weight (MW) in Daltons (Da) for your protein? For extra credit: 8. Does this gene have human homologs, and if so, give the name of one of the human homologs? Print out your file with the sequence you were given and the answers to questions 1-7 (or 1-8, if you did the extra credit), and give it to your TA before you leave the lab.