Hands-On Four Investigating Inherited Diseases

Similar documents
Investigating Inherited Diseases

BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1

user s guide Question 1

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

COMPUTER RESOURCES II:

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University

MODULE 5: TRANSLATION

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

Chapter 2: Access to Information

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

DNA RNA Protein Trait Protein Synthesis (Gene Expression) Notes Proteins (Review) Proteins make up all living materials

Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009)

Tutorial for Stop codon reassignment in the wild

FUNCTIONAL BIOINFORMATICS

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

Bioinformatics Translation Exercise

DNA is normally found in pairs, held together by hydrogen bonds between the bases

Finding Genes, Building Search Strategies and Visiting a Gene Page

Genes and gene finding

Exploring genomes - A Treasure Hunt on the Internet

INTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet

Overview: GQuery Entrez human and amylase Search Pubmed Gene Gene: collected information about gene loci AMY1A Genomic context Summary

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

Gene mutation and DNA polymorphism

Finding Genes, Building Search Strategies and Visiting a Gene Page

Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M.

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

The University of California, Santa Cruz (UCSC) Genome Browser

user s guide Question 3

user s guide Question 3

Interpretation of sequence results

Disease and selection in the human genome 3

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G

Annotation of a Drosophila Gene

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

You use the UCSC Genome Browser ( to assess the exonintron structure of each gene. You use four tracks to show each gene:

ab initio and Evidence-Based Gene Finding

Aaditya Khatri. Abstract

Protein Bioinformatics Part I: Access to information

Biotechnology Explorer

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources

Molecular Genetics of Disease and the Human Genome Project

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

BLAST. Subject: The result from another organism that your query was matched to.

Introduction to Bioinformatics Dr. Robert Moss

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks


HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet

SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

NAME:... MODEL ANSWER... STUDENT NUMBER:... Maximum marks: 50. Internal Examiner: Hugh Murrell, Computer Science, UKZN

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?

BIO 152 Principles of Biology III: Molecules & Cells Acquiring information from NCBI (PubMed/Bookshelf/OMIM)

DNA sentences. How are proteins coded for by DNA? Materials. Teacher instructions. Student instructions. Reflection

The Making of the Fittest: Natural Selection and Adaptation

Browser Exercises - I. Alignments and Comparative genomics

UNIT 7. DNA Structure, Replication, and Protein Synthesis

Figure S1. Characterization of the irx9l-1 mutant. (A) Diagram of the Arabidopsis IRX9L gene drawn based on information from TAIR (the Arabidopsis

Chimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R.

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

Human Genomics. Higher Human Biology

Flow of Genetic Information

Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

Introduction to DNA-Sequencing

MODULE TSS1: TRANSCRIPTION START SITES INTRODUCTION (BASIC)

Lecture 2: Biology Basics Continued. Fall 2018 August 23, 2018

Genes & Gene Finding

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Why study sequence similarity?

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Supplemental Data Supplemental Figure 1.

Project Manual Bio3055. Lung Cancer: Cytochrome P450 1A1

Lecture 19A. DNA computing

The Gene Gateway Workbook

User s Manual Version 1.0

Using the Genome Browser: A Practical Guide. Travis Saari

GENETICS and the DNA code NOTES

Analyzing an individual sequence in the Sequence Editor

Sequence Alignments. Week 3

Guided tour to Ensembl

Understanding Genes & Mutations. John A Phillips III May 16, 2005

Gene-centered databases and Genome Browsers

Gene-centered databases and Genome Browsers

Electronic Supplementary Information

Chem Lecture 1 Introduction to Biochemistry

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Which Process Is The First Step In Making A Protein From Dna Instructions >>>CLICK HERE<<<

Computational gene finding

Multiplexing Genome-scale Engineering

ORFs and genes. Please sit in row K or forward

Bacteriophage Evolution Lab Week

MODULE TSS2: SEQUENCE ALIGNMENTS (ADVANCED)

Transcription:

Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise to inherited diseases. You will learn how to: Translate a DNA sequence into a sequence of amino acids Look for mutations in this sequence and determine how they cause inherited human diseases. A) Translating a DNA sequence Points to remember The genetic code is a triplet code, groups of 3 letters (bases) code for amino acids The coding sequence (CDS) of a gene generally begins with the start codon ATG. ATG encodes the amino acid methionine (MET, M) There are three stop codons (TAA, TGA, TAG) which indicate the end of proteins Let us try this exercise by hand. Consider the following DNA sequence: >CDS of human beta globin atggtgcacctgactcctgaggagaagtctgccgttactgccctgtggggcaaggtgaac gtggatgaagttggtggtgaggccctgggcaggttgctggtggtctacccttggacccag aggttctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaag gtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggac aacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggat cctgagaacttcaggctcctgggcaacgtgctggtctgtgtgctggcccatcactttggc aaagaattcaccccaccagtgcaggctgcctatcagaaagtggtggctggtgtggctaat gccctggcccacaagtatcactaa 1) What is the nucleotide in position 20 in this sequence (HBB)? 2) Translate the first 27 bases of the sequence: atg gtg cac ctg act cct gag gag aag Computers are generally quicker and more accurate. Go to http://www.cs.sjsu.edu/~khuri/aua_2016 and click on HandsOn_Sequences.txt Copy onto the clipboard the sequence (only): CDS of human beta globin Go to the ExPasy translate tool at http://web.expasy.org/translate/ Paste the sequence in the box Go to Output format: (under the box) and choose Includes nucleotide sequence Click the TRANSLATE SEQUENCE button. www.usps.com/communications/news/stamps/2004/sr04_059.htm and www.sankalpindia.net

What you should see: The DNA sequence with the amino acid sequence written out underneath it. Bioinformaticians mostly use the "single letter code", in which each amino acid is represented by a single letter. There are six possible translations because there are six possible reading frames depending on where you start reading the sequence. The first reading frame with a Methionine at the beginning and a dash (indicating a stop codon) at the end is the correct translation. Let us save the sequence for later use: Click on the link 5'3' Frame 1 Now you should see only the amino acid sequence (without the DNA sequence). Copy the whole amino acid sequence without the stop codon (Stop) to the clipboard. Save the file since we will use it in the next part of the problem when we look at mutations. B) Looking at the effects of mutations (Part One) Mutations in the DNA sequence affect the resulting protein in different ways. In this exercise you will translate the DNA sequence with a mutation found in individuals with the disease called Sickle Cell Anemia and compare the resulting protein with the one you found by translating the human beta globin gene, HBB (the sequence of part A). Consider the following variant of the human beta globin gene: >CDS of human beta globin mutant 1 atggtgcacctgactcctgtggagaagtctgccgttactgccctgtggggcaaggtgaac gtggatgaagttggtggtgaggccctgggcaggttgctggtggtctacccttggacccag aggttctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaag gtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggac aacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggat cctgagaacttcaggctcctgggcaacgtgctggtctgtgtgctggcccatcactttggc aaagaattcaccccaccagtgcaggctgcctatcagaaagtggtggctggtgtggctaat gccctggcccacaagtatcactaa 3) What is the nucleotide in position 20 in this sequence?. In this case, a single nucleotide has been changed for another: an A changed into a T in position 20. Let us study the effects of this substitution. 4) Translate the first 27 bases of the mutant sequence: atg gtg cac ctg act cct gtg gag aag Now let us use the computer to translate the mutant sequence. 2

Go to http://www.cs.sjsu.edu/~khuri/aua_2016 and click on HandsOn_Sequences.txt Using the mouse, copy the mutant sequence (only): the human beta globin CDS mutant 1. Go to the ExPasy translate tool at http://web.expasy.org/translate/ Paste the sequence in the box Go to Output format (under the box) and choose Includes nucleotide sequence Click the TRANSLATE SEQUENCE button. Click on the link 5'3' Frame 1. Now you should see only the amino acid sequence (without the DNA sequence). Save the amino acid sequence in the same file as part A). 5) Can you find the difference between the two amino acid sequences?. It is hard to see the difference between the two sequences. The difference is a change of one amino acid for another. To help you see the difference, you can make an alignment of the two sequences, and then the difference will stand out. Go to NCBI: http://www.ncbi.nlm.nih.gov Click on BLAST under Popular Resources on the right-hand side of the page Choose Align two (or more) sequences using BLAST (bl2seq) under Specialized BLAST Click on the blastp tab at the top of the page Paste the amino acid sequence obtained from the translation of the CDS of the human beta globin in the first window Paste the amino acid sequence from the translation of the CDS of the human beta globin mutant 1 in the second window Click on the blue BLAST button at the bottom of the page. Here is the alignment you will see: Query 1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK 60 MVHLTP EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK Sbjct 1 MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK 60 Query 61 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG 120 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG Sbjct 61 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG 120 Query 121 KEFTPPVQAAYQKVVAGVANALAHKYH 147 KEFTPPVQAAYQKVVAGVANALAHKYH Sbjct 121 KEFTPPVQAAYQKVVAGVANALAHKYH 147 Notice that the Query sequence (normal protein) and the Sbjct sequence (mutant 1 protein) perfectly align everywhere except for the seventh position. This replacement of a hydrophilic glutamate residue (E) in position 7, with a hydrophobic valine residue (V) creates a hydrophobic spot on the outside of the protein. This causes the hemoglobin molecules to stick together with very serious consequences. We investigate the consequences in the next hands-on exercises. 3

C) Looking at the effects of mutations (Part Two) In this exercise, you will translate the DNA sequence with a mutation found in individuals with the disease called Beta Thalassemia and compare the resulting protein with the one you found by translating the human beta globin gene, HBB (the sequence of part A). Consider the following variant of the human beta globin gene: >CDS of human beta globin mutant 2 atggtgcacctgactcctgggagaagtctgccgttactgccctgtggggcaaggtgaac gtggatgaagttggtggtgaggccctgggcaggttgctggtggtctacccttggacccag aggttctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaag gtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggac aacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggat cctgagaacttcaggctcctgggcaacgtgctggtctgtgtgctggcccatcactttggc aaagaattcaccccaccagtgcaggctgcctatcagaaagtggtggctggtgtggctaat gccctggcccacaagtatcactaa A single nucleotide has been deleted from the wild-type (the CDS of human beta globin of Part A) to produce the above mutant. To find out what effect this mutation has on the beta globin protein, we will need to translate it into an amino acid sequence as we did before. Let us first translate by hand the first 57 bases: atg gtg cac ctg act cct ggg aga agt ctg ccg tta ctg ccc tgt ggg gca agg tga Now let us use the computer to translate the mutant sequence. Go to http://www.cs.sjsu.edu/~khuri/aua_2016 and click on HandsOn_Sequences.txt Using the mouse, copy the mutant sequence (only): the human beta globin CDS mutant 2. Go to the ExPasy translate tool at http://web.expasy.org/translate/ Paste the sequence in the box Go to Output format (under the box) and choose Includes nucleotide sequence Click the TRANSLATE SEQUENCE button. Click on the link 5'3' Frame 1. Now you should see only the amino acid sequence (without the DNA sequence). Save the amino acid sequence in the same file as part A). Carefully examine the second (the mutant 2) amino acid sequence and compare it to the one you obtained in part A. 6) Can you find the difference between the two amino acid sequences?. 4

What you should see: the first 6 amino acids are the same these are followed by 12 amino acids, namely: GRSLPLLPCGAR which were not in the normal protein followed by a stop codon after the first stop codon, we have amino acids, which are different from the normal beta globin sequence, however once a stop codon has been reached the rest of the sequence is not important. This type of mutation is called a frame shift mutation and results in truncated or shortened proteins because of the stop codon. Mutations like this are common in people with beta thalassemia. As humans have two copies of each gene, people with this mutation are likely to have low levels of beta globin produced from one normal copy of the gene. D) Using the UCSC Genome Browser We now use the genome database available through the University of California at Santa Cruz (UCSC) to investigate the beta globin gene with and without the mutations. Go to the UCSC Genome Browser: http://genome.ucsc.edu Click on the Genome Browser link in the left side navigation bar. This will bring you to the Human (Homo sapiens) Genome Browser Gateway. This tool allows you to search the human genome and retrieve gene information. Keep the default values of Mammal under group and Human under genome. Keep the default version Dec. 2013 (GRCh38/hg38) under assembly. Type HBB (this is the standard notation for the Hemoglobin beta chain) in the search term window. Click on the submit button. The search results are returned on a new page. We are interested in the RefSeq Genes. This is the link to the reference sequence : the sequence that is considered the authoritative version. Click on the HBB reference sequence link: HBB at chr11:5225466-5227071 to view the HBB gene on chromosome 11. You will be taken to a new page that displays a graphical window in the center. Near the top of the page is an illustration of the entire human chromosome 11. 7) What is the red line on the illustration of human chromosome 11 indicating?. One of the top lines in the main panel (the graphical window) represents the GENCODE V22 Comprehensive Transcript Set. Click on the blue HBB to the left of GENCODE V22 Comprehensive Transcript Set. 5

This brings you to the Human Gene HBB (uc001mae.2) Description and Page Index page which contains a lot of information on HBB and annotations compiled from many different sources and websites. 8) According to RefSeq Summary (NM_000518), what is the order of the genes in the beta-globin cluster?. 9) Does this page include results obtained from microarray analysis?. 10) Under what heading does this page have graphical representations of the secondary structure of the 5 UTR and 3 UTR?. 11) a) What is the size of the 5 UTR?. b) What is the size of the 3 UTR?. Go back to the Genome Browser main panel that showed the HBB gene on the map of chromosome 11. The dark blue line in the main panel represents the HBB reference sequence RefSeq Genes. The thicker parts of the line are exons and the thin lines are introns. The direction of the arrow shows the direction of transcription. Above the chromosome illustration are a series of move, zoom in and zoom out navigational tools. Let us zoom out 30x (i.e., 30 times). Click on 3x to the right of zoom out Click on 10x to the right of zoom out Move a little to the right with one click of the right arrow >. This brings up a cluster of interesting HB genes (HBD, HBG1, HBG2) in the main panel. 12) What are the HBD and HBG1, HBG2 genes on human chromosome 11?. 13) Hypothesize why these HB genes are found in a cluster.. Go back to the Genome Browser main panel that showed the HBB gene on the map of chromosome 11. Click on the blue HBB hyperlink under RefSeq Genes (in the main window). This brings you to the RefSeq Gene HBB page where most of the data is from NCBI. Scroll down to the link Genomic Sequence from assembly under Links to sequence:. Click on the Genomic Sequence from assembly link. This brings you to a formatting page that allows you to structure the way you want the DNA sequence to be displayed. Most of the default settings are fine. Read (and do not change) the selected ones. 6

Make sure the radio button for Exons in upper case, everything else in lower case under Sequence Formatting Options: is selected. Hit the submit button at the bottom of the page. Now you finally have the HBB gene sequence. 14) a) How many exons does the gene have?. b) How many introns does the gene have?. 15) What are the first 2 bases of each intron?. 16) What are the last 2 bases of each intron?. 17) Are there more DNA bases in the introns or the exons of the HBB gene?. Let us now identify the DNA sequence for the sickle cell mutant hemoglobin allele. Go back to the Genome Browser main panel that showed the HBB gene on the map of chromosome 11. To get the sequence of the mutant allele, we need to add information about genetic variations to the map (before we zoomed out). Scroll down to the section entitled Variation. This is one of the tracks of the UCSC Genome Browser. You might have to click on + to the left of Variation to expand this track. Click on Common SNPs(144). SNPs are single nucleotide polymorphisms, in other words, single base mutations. In the new page: Common SNPs(144) Track Settings, expand Coloring Options by clicking on the + next to it. 18) What is the default color of Coding Synonymous SNPs?. 19) What is the default color of Coding Non-Synonymous SNPs?. Go back to the Genome Browser main panel that showed the HBB gene on the map of chromosome 11. Scroll down to the section entitled Variation. Set the Common SNPs(144) pulldown to pack. Hit the refresh button to the right of Variation. Now the main panel has more information in it. At the bottom of the main panel there is a section entitled Simple Nucleotide Polymorphisms. Each of these is a known mutation of the human HBB gene. We are interested in the rs334 polymorphism. This is the sickle cell mutation. The horizontal bar to the right of rs334 indicates the location of the SNP in HBB. You might have to be patient while looking for rs334 among all the SNPs. 20) Do we have a synonymous or non-synonymous mutation?. Why?. 21) Is the sickle cell mutation in an intron or an exon?. 7

22) Is the mutation at the beginning or end of the gene?. Click on the rs334 polymorphism. We are taken to a page entitled Simple Nucleotide Polymorphisms (dbsnp 144) Found in >= 1% of Samples summarizing the information about this SNP. It contains the following three lines: Strand: - Observed: A/C/G/T Reference allele: A 23) What is meant by Strand: -?. Was it expected?. Why?. 24) What is meant by Observed: A/C/G/T?. Was it expected?. Why?. 8