Pre-Lab Questions. 1. Use the following data to construct a cladogram of the major plant groups.

Similar documents
AP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32.

From AP investigative Laboratory Manual 1

Why Use BLAST? David Form - August 15,

Biotechnology Explorer

LAB. WALRUSES AND WHALES AND SEALS, OH MY!

BLASTing through the kingdom of life

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

BLASTing through the kingdom of life

BLASTing through the kingdom of life

Discover the Microbes Within: The Wolbachia Project. Bioinformatics Lab

Evolution 1: Evidence of Evolution LabCh. 16

COMPUTER RESOURCES II:

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Fig State the precise name of each of the parts of the DNA molecule labelled X, Y and Z. X... Y... Z... [3]

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

Chimp Sequence Annotation: Region 2_3

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

PRACTICE TEST ANSWER KEY & SCORING GUIDELINES BIOLOGY

SAMPLE LITERATURE Please refer to included weblink for correct version.

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Why learn sequence database searching? Searching Molecular Databases with BLAST

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5

ELE4120 Bioinformatics. Tutorial 5

Evolutionary Genetics. LV Lecture with exercises 6KP

Guided tour to Ensembl

Aaditya Khatri. Abstract

CHAPTER 21 LECTURE SLIDES

Determining Evolutionary Relationships Using BLAST

Teaching Bioinformatics in the High School Classroom. Models for Disease. Why teach bioinformatics in high school?

Amino Acid Sequences and Evolutionary Relationships. How do similarities in amino acid sequences of various species provide evidence for evolution?

COMPUTER SIMULATIONS AND PROBLEMS

Hands-On Four Investigating Inherited Diseases

user s guide Question 3

Basic Bioinformatics: Homology, Sequence Alignment,

Creation of a PAM matrix

IPA : Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

BLAST. Subject: The result from another organism that your query was matched to.

Lecture #8 2/4/02 Dr. Kopeny

Computational Biology and Bioinformatics

DNA and Chromosomes. 2. What molecules make up the rungs of the ladder? 3. What keeps the two sides of the ladder paired?

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

HEMOGLOBIN: PHYSIOLOGICAL EXPRESSION AND EVOLUTION OF GENE CLUSTER

Sequence Databases and database scanning

O C. 5 th C. 3 rd C. the national health museum

Chapter 11. How Genes Are Controlled. Lectures by Edward J. Zalisko

Your name: BSCI410-LIU/Spring 2007 Homework #2 Due March 27 (Tu), 07

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

BIOINFORMATICS IN BIOCHEMISTRY

Answer: Sequence overlap is required to align the sequenced segments relative to each other.

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

Investigating Common Descent: Formulating Explanations and Model

UNIT 3: GENETICS Chapter 9: Frontiers of Biotechnology

(a) (3 points) Which of these plants (use number) show e/e pattern? Which show E/E Pattern and which showed heterozygous e/e pattern?

Data Retrieval from GenBank

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

Name Class Date. Practice Test

Lecture 10 Molecular evolution. Jim Watson, Francis Crick, and DNA

Protein Bioinformatics Part I: Access to information

Ohio s State Tests ITEM RELEASE SPRING 2017 BIOLOGY

Biology 40S: Course Outline Monday-Friday Slot 1, 8:45 AM 9:45 AM Room 311 Teacher: John Howden Phone:

user s guide Question 3

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang

Bioinformatics for Proteomics. Ann Loraine

Analysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004

BIO 152 Principles of Biology III: Molecules & Cells Acquiring information from NCBI (PubMed/Bookshelf/OMIM)

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Genome Sequence Assembly

Station 1: Fossil Records

Exercise1 ArrayExpress Archive - High-throughput sequencing example

Molecular Evolution. Lectures Papers Lab. Dr. Walter Salzburger. Structure of the course: Structure i


Week 7 - Natural Selection and Genetic Variation for Allozymes

September 2017 AP Biology calendar

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

LESSON TITLE Decoding DNA. Guiding Question: Why should we continue to explore? Ignite Curiosity

Temperature sensitive gene expression: Two rules.

Sequencing the Human Genome

Exploring the Genetic Basis for Behavior. Instructor s Notes

Sequence Analysis Lab Protocol


Name: TOC#. Data and Observations: Figure 1: Amino Acid Positions in the Hemoglobin of Some Vertebrates

Humans Evolved in Response to a Challenge

Protein Synthesis. Lab Exercise 12. Introduction. Contents. Objectives

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

Ohio s State Tests ITEM RELEASE SPRING 2016 BIOLOGY

NON MENDELIAN GENETICS. DNA, PROTEIN SYNTHESIS, MUTATIONS DUE DECEMBER 8TH

Computational gene finding. Devika Subramanian Comp 470

The Transition to Life!

Big picture and history

Stem Cell Research Controversy

Tuesday, February 5 th. Today s Agenda: 1.Turn in pedigree assignment 2.Quiz! 3.KWL for mutations 4.Intro to mutations

Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes

Allergy Warning: Students with know shellfish allergies should avoid all potential contact with shellfish samples.

Finding Genes, Building Search Strategies and Visiting a Gene Page

Transcription:

Pre-Lab Questions Name: 1. Use the following data to construct a cladogram of the major plant groups. Table 1: Characteristics of Major Plant Groups Organism Vascular Flowers Seeds Tissue Mosses 0 0 0 Pine trees 1 0 1 Flowering plants 1 1 1 Ferns 1 0 0 Total 3 1 2 2. GAPDH (Glyceraldehyde 3-phosphate dehydrogenase) is an enzyme that catalyzes the sixth step in glycolysis, an important reaction that produces molecules used in cellular respiration. The following data table shows the percentage similarity of this gene and the protein it expresses in humans versus other species. For example, according to the table, the GAPDH gene in chimpanzees is 99.6% identical to the gene found in humans, while the protein is identical. Table 2. Percentage Similarity Between the GAPDH Gene and Protein in Humans and Other Species Species Gene Percentage Similarity Protein Percentage Similarity Chimpanzee (Pan troglodytes) 99.6% 100% Dog (Canis lupus familiaris) 91.3% 95.2% Fruit fly (Drosophila melanogaster) (72.4% 76.7% Roundworm (Caenorhabditis elegans) 68.2% 74.3% 1

a. Why is the percentage similarity in the gene always lower than the percentage similarity in the protein for each of the species? b. In the space below, draw a cladogram depicting the evolutionary relationships among all five species (including humans) according to their percentage similarity in the GAPDH gene. Part I: Searching for Fossil Genes (adapted from David Form) You are the manager of a new animal food supply company. You need to find out if vitamin C needs to be included in new animal foods designed for dogs, cows, cats, mice and guinea pigs. Based on your research on the GULO gene, you will be able to determine if you need to provide vitamin C in these foods. Most mammals, such as mice, can produce their own vitamin C and therefore do not need a source of vitamin C in their diet. The GULO gene codes for an enzyme, L- gulonolactone oxidase, involved in vitamin C synthesis. The GULO gene is present in mice and most other mammals, but is either missing, or is nonfunctional, in some mammals. These animals cannot make their own vitamin C and must have this vitamin present in their diet. We will see if various mammals, including the primates, such as humans and chimps, have a functional GULO gene. A functional gene is able to produce a functional protein, in this case the GULO enzyme. In some mammals, the GULO gene is an example of a pseudogene. Pseudogenes are vestigial genes. That is, they were once functional in an ancestral species, but since they were no longer needed they accumulated mutations until they became non- functional. In many cases they evolve to the point where a protein can no longer be produced at all. Pseudogenes represent molecular evidence for evolution. As fossils are the remains of extinct organisms, pseudogenes are the remains of extinct genes. Procedure: The U. S. Government maintains a set of databases called GenBank, which contains nucleotide and amino acid sequences for those genes and proteins whose sequences have been determined. For your research we will use a computer program called BLAST. BLAST is able to search the GenBank databases. If you input a nucleotide 2

or amino acid sequence into BLAST, it will search for any known genes or proteins that are similar to the one that you entered. You will use BLAST to determine if the genomes of cows, pigs, humans and chimps contain functional GULO genes, or if they contain vestigial GULO pseudogenes, which do not result in production of the GULO enzyme. Part A. Is the GULO gene present in various mammals? 1. Copy the nucleotide sequence below for the mouse GULO gene. > mouse gulo gene CDS ATGGTCCATGGGTACAAAGGGGTCCAGTTCCAAAACTGGGCGAAGACCTATGGCTGCAGTCCAG AGATGTACTACCAGCCCACATCAGTGGGGGAGGTCAGAGAGGTGCTGGCCCTGGCCCGGCAGC AGAACAAGAAAGTGAAGGTGGTGGGTGGCGGCCACTCGCCTTCAGACATCGCCTGCACCGATG GCTTCATGATTCACATGGGCAAGATGAACCGGGTTCTCCAGGTGGACAAGGAGAAGAAGCAGG TCACAGTGGAAGCCGGTATCCTCCTGACTGACCTGCACCCACAGCTGGACAAGCATGGCCTGGC CCTGTCTAATCTGGGAGCCGTGTCTGATGTGACGGTTGGTGGCGTCATTGGGTCTGGAACACATA ACACCGGGATCAAGCACGGTATCCTGGCCACCCAGGTGGTGGCCCTGACCCTGATGAAGGCTGT GGAACAGTTCTGGAATGTTCTGAGTCAAGTAATGCAGATGTGTTCCAGGCTGCAAGGGTGCACC TGGGCTGCCTGGGTGTTATCCTCACTGTCACCCTGCAGTGTGTGCCACAGTTCCACCTTCTGGAG ACATCCTTTCCTTCGACCCTCAAGGAGGTCCTTGACAACCTGGACAGCCACCTGAAGAAGTCTG AGTACTTCCGCTTCCTCTGGTTTCCTCACAGTGAGAACGTCAGCATCATCTACCAAGATCACACC AACAAGGAGCCCTCCTCTGCATCTAACTGGTTTTGGGACTATGCCATTGGGTTCTACCTCCTGGA ATTCTTGCTCTGGACCAGCACCTACCTGCCACGCCTCGTGGGCTGGATCAACCGCTTCTTCTTCT GGCTGCTGTTCAACTGCAAGAAGGAGAGCAGCAACCTCAGCCACAAGATCTTCTCCTACGAGTG TCGCTTCAAGCAGCATGTCCAAGACTGGGCCATCCCCAGGGAGAAGACCAAGGAGGCCCTGCTG GAGCTAAAGGCCATGCTGGAGGCCCACCCCAAGGTGGTAGCCCACTACCCCGTGGAGGTGCGCT TCACCCGAGGTGATGACATCCTGCTGAGCCCGTGCTTCCAGAGGGACAGCTGCTACATGAACAT CATTATGTACAGGCCCTATGGGAAGGATGTGCCTCGGTTGGATTACTGGCTGGCCTATGAGACC ATCATGAAGAAGTTTGGAGGCAGGCCCCACTGGGCAAAGGCCCACAATTGCACCAGGAAGGAC TTTGAGAAAATGTACCCCGCCTTTCACAAGTTCTGTGACATCCGCGAGAAGCTGGACCCCACTG GAATGTTCTTGAATTCGTACCTGGAAAAGGTTTTCTACTAA 2. On the internet, go to www.ncbi.nlm.nih.gov. 3. From the menu at the top of the page, select BLAST. 4. From the BLAST menu, select nucleotide BLAST. It is under Basic BLAST. 5. Copy the mouse GULO sequence into the box under Enter Query Sequence. 6. Scroll down until you see Database. 7. Check the Others box. Change the database setting in the box to nucleotide collection. 8. In the Organism window, type in cow. When you see cow appear in the box, select it. 9. Scroll to the bottom and find the BLAST button. Above the BLAST button, you will see the optimize for box. Select optimize for somewhat similar sequences (BLASTn) 10. Scroll back down and hit the BLAST button. 11. When BLAST is done with its search, you can scroll down and see a colorized diagram indicating the degree of similarity of the BLAST hits to your mouse GULO nucleotide sequence. Red and pink/purple mean a good match, while green, blue and black indicate a poor match. If the colored line spans the entire length of the window, then the hit sequence matches the inquiry sequence along its entire length. We want to see a high quality match along a majority of the inquiry sequence. 3

12. Below the colorized diagram is a hit list of your results. This shows the quality of matching as an E- value. An E- value is the chance that the matchup may be due to a random matching of a sequence of bases. The smaller the E- value, the more confidence you can have in your matching. A good match should have a low E- value (red or pink line) and an alignment along a large segment of the sequence. 13. Note your result in the chart below. 14. Now start again and do BLAST searches for pig (Sus scrufa), human (Homo sapiens), cow (Bos taurus), guinea pig (Cavia porcellus), and chimpanzee (Pan troglodytes) GULO genes. Record your data in the chart. GULO gene BLAST results Species Mouse Present or Query cover % identity / E- value Absent? Present 100% 100 / 0.0 (Mus musculus) Dog (Canis familiaris) Cow (Bos Taurus) Pig (Sus scrufa) Guinea Pig (Cavia porcellus) Human (Homo sapiens) Chimp (Pan troglodytes) 4

PART B. Does the human GULO gene produce a functional protein? We will now use protein BLAST to search for GULO proteins in cows, pigs, humans and chimpanzees. 1. Copy the mouse GULO protein below > mouse GULO protein MVHGYKGVQFQNWAKTYGCSPEMYYQPTSVGEVREVLALARQQNKKVKVVGGGHSPSDIACTDG FMIHMGKMNRVLQVDKEKKQVTVEAGILLTDLHPQLDKHGLALSNLGAVSDVTVGGVIGSGTHNT GIKHGILATQVVALTLMKADGTVLECSESSNADVFQAARVHLGCLGVILTVTLQCVPQFHLLETSFPS TLKEVLDNLDSHLKKSEYFRFLWFPHSENVSIIYQDHTNKEPSSASNWFWDYAIGFYLLEFLLWTSTY LPRLVGWINRFFFWLLFNCKKESSNLSHKIFSYECRFKQHVQDWAIPREKTKEALLELKAMLEAHPK VVAHYPVEVRFTRGDDILLSPCFQRDSCYMNIIMYRPYGKDVPRLDYWLAYETIMKKFGGRPHWAK AHNCTRKDFEKMYPAFHKFCDIREKLDPTGMFLNSYLEKVFY 2. Go to www.ncbi.nlm.nih.gov. 3. Select BLAST from the menu at the top of the page. 4. From the BLAST menu, select protein BLAST. It is under Basic BLAST. 5. Copy the mouse GULO sequence into the box under Enter Query Sequence. 6. Scroll down until you see Organism. 7. In the Organism window, type in cow. When you see cow appear in the box, select it. 8. Scroll to the bottom and select the BLAST button. 9. When BLAST is done with its search, you can scroll down and see a chart of your results. Note your result in the chart below. 10. Now start again and do BLAST searches for pig (Sus scrufa), human (Homo sapiens), cow (Bos taurus), guinea pig (Cavia porcellus), and chimpanzee (Pan troglodytes) GULO genes. Record your data in the chart. 11. Look for proteins with the same name (, L- gulonolactone oxidase). If the GULO protein is not present, other, more distantly related proteins may come up. They will have a much lower score and a higher E- value. Note that the E- value represents the chance that the result is due a random matching of some amino acid sequences from both proteins. An E- value of 0 means a statistically perfect match. A good E- value should be much lower than e - 4. 12. Record your data in the chart. 5

GULO Protein BLAST results Species Mouse Present or E value Absent? Present 0.0 Query cover % identity (Mus musculus) Dog (Bos taurus) Pig (Sus scrufa) Cow (Bos Taurus) Guinea Pig (Cavia porcellus) Human (Homo sapiens) Chimp (Pan troglodytes) Critical Thinking Exercises 1. Why do you think that primates (monkeys, apes and humans) have lost the ability to produce vitamin C? (Hint: think about the diet of early primates). 6

2. Explain why the GULO gene in humans may be considered vestigial. 3. What can you infer about the GULO BLAST results between humans and chimps? 4. Your new pet food company is designing healthy foods for dogs, pigs, cows, mice and guinea pigs. From your results, to which types of feed will you suggest that the manufacturer add supplemental vitamin C? Justify your suggestion with a conclusive and specific result. Part 3: Search the databases and create your own phylogeny Step 1. You will need to watch the pre- lab tutorial posted on itunesu. https://www.youtube.com/watch?v=ijgdhsqrtki&list=hl1395263867 In this tutorial, I walk you through how to search a database (UniProt) and create a phylogenetic tree from selected protein sequences. Step 2. Choose a protein (see suggestions in table below) that interests you or select one from the list provided below. Search for that protein name in UniProt. www.uniprot.org Step 3. Select a total of 15 different organisms to use in the construction of your phylogeny. Copy and paste these protein sequences into a single document in FASTA format. It is crucial that the sequences are in FASTA format. Step 4. Change all of the names for your 15 sequences to a simpler name that reflects only the genus and species or common name. 7

Step 5. Go to www.phylogeny.fr and under Phylogeny Analysis, select one- click analysis to create your tree. Feel free to name your analysis. Paste in all 15 sequences into the window and click submit. Step 6. Generate a pdf of your phylogeny and paste it into the space at the end of this document. Step 7. Go to the T- coffee website: http://tcoffee.vital- it.ch/apps/tcoffee/do:regular Paste your FASTA file of sequences into the sequences to align window and submit to generate a nice- looking alignment of the sequences. Take s screenshot(s) of the alignment and paste this into the bottom of this document as well. Step 8. At the end of this document, write a one- page conclusion/interpretation of your phylogeny and sequence alignment that addresses the following questions: - What does your phylogeny suggest about conserved core processes and what inferences can you make about the common ancestry of the organisms you selected to analyze? - What did your initial search results suggest about how widely distributed this gene was within and across the domains of life? - What can you tell about the degree of sequence conservation in the alignment that you generated in T- coffee? Does this alignment corroborate your tree results? Does anything stand out in the alignment as odd? - What other data, either morphological, genetic, or both, could you add to your analyses that could improve or extend the phylogenetic tree? - Elaborate on what else you learned or felt was most interesting about this lab. Suggested proteins to explore Aquaporins ATP synthase Catalase Actin GAPDH Keratin Myosin Pax1 Hox genes Ubiquitin PITX1 MC1R Cytochrome C Cytochrome B Cellulose synthase (plants) Callose synthase (plants) Lipoxygenase Superoxide dismutase Peroxidase rhodopsin Many others to choose from, so, be creative! 8

9

1 0

1 1

Name 1 2