Protein Bioinformatics PH Final Exam

Similar documents
Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Sequence Databases and database scanning

BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics

COMP 555 Bioalgorithms. Fall Lecture 1: Course Introduction

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

ONLINE BIOINFORMATICS RESOURCES

Homework Project Protein Structure Prediction Contact me if you have any questions/problems with the assignment:

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

COMPARATIVE ANALYSIS AND INSILICO CHARACTERIZATION OF LEA, DREB AND OsDHODH1 GENES INVOLVED IN DROUGHT RESISTANCE OF ORYZA SATIVA

Protein Bioinformatics Part I: Access to information

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology

BMB/Bi/Ch 170 Fall 2017 Problem Set 1: Proteins I

Name Campbell Chapter 16 The Molecular Basis of Inheritance

DNA Glycosylase Exercise

Structural bioinformatics

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

DNA Repair Protein Exercise

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

BME 110 Midterm Examination

Bioinformatics Practical Course. 80 Practical Hours

systemsdock Operation Manual

AP Biology Book Notes Chapter 3 v Nucleic acids Ø Polymers specialized for the storage transmission and use of genetic information Ø Two types DNA

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

BIRKBECK COLLEGE (University of London)

VALLIAMMAI ENGINEERING COLLEGE

Applied Bioinformatics

Browser Exercises - I. Alignments and Comparative genomics

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Metabolic modeling (4cr)

CHAPTER 21 LECTURE SLIDES

Genome Resources. Genome Resources. Maj Gen (R) Suhaib Ahmed, HI (M)

SUPPLEMENTARY INFORMATION

INTERDISCIPLINARY INVESTIGATION (IDI)-CLASSROOM

BIOINFORMATICS Introduction

Protein structure. Wednesday, October 4, 2006

Tutorial. Visualize Variants on Protein Structure. Sample to Insight. November 21, 2017

COMPUTER RESOURCES II:

Red and black licorice sticks, colored marshmallows or gummy bears, toothpicks and string. (Click here for the Candy DNA Lab Activity)

KEGG: Kyoto Encyclopedia of Genes and Genomes

A tutorial introduction into the MIPS PlantsDB barley&wheat database instances

Nucleic acids. How DNA works. DNA RNA Protein. DNA (deoxyribonucleic acid) RNA (ribonucleic acid) Central Dogma of Molecular Biology

Hmwk 6. Nucleic Acids

UNIT 24: Nucleic Acids Essential Idea(s): The structure of DNA allows efficient storage of genetic information.

CS313 Exercise 1 Cover Page Fall 2017

Peptide libraries: applications, design options and considerations. Laura Geuss, PhD May 5, 2015, 2:00-3:00 pm EST

How is smoking linked to lung cancer? Learn about one of the genetic risk factors for lung cancer (the KRAS oncogene) using bioinformatics tools

Searching and Analysing 3D Protein-ligand Structures using PDBe web services. Abhik Mukhopadhyay.

An insilico Approach: Homology Modelling and Characterization of HSP90 alpha Sangeeta Supehia

September 19, synthesized DNA. Label all of the DNA strands with 5 and 3 labels, and clearly show which strand(s) contain methyl groups.

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

Web based Bioinformatics Applications in Proteomics. Genbank

Warm Up #15: Using the white printer paper on the table:

(Due Sept 9 th ) Problem Set 2

The Double Helix. DNA and RNA, part 2. Part A. Hint 1. The difference between purines and pyrimidines. Hint 2. Distinguish purines from pyrimidines

Introduction to Bioinformatics

Investigating Inherited Diseases

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

Metabolism and metabolic networks. Metabolism is the means by which cells acquire energy and building blocks for cellular material

What s New in Discovery Studio 2.5.5

Hands-On Four Investigating Inherited Diseases

1) The penicillin family of antibiotics, discovered by Alexander Fleming in 1928, has the following general structure: O O

Textbook Reading Guidelines

Supplementary Figure 1. Processing and quality control for recombinant proteins.

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa

Biochemistry 661 Your Name: Nucleic Acids, Module I Prof. Jason Kahn Exam I (100 points total) September 23, 2010

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

Clamping down on pathogenic bacteria how to shut down a key DNA polymerase complex

Lesson Overview DNA Replication

2. The dropdown box has a number of databases that are searchable. Select the gene option and search for dihydrofolate reductase.

Nucleic acids. What important polymer is located in the nucleus? is the instructions for making a cell's.

Protein Structure Prediction. christian studer , EPFL

Protein Folding Problem I400: Introduction to Bioinformatics

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Workunit: P Title:MT

Introduction to molecular docking

Nature Methods: doi: /nmeth Supplementary Figure 1. Construction of a sensitive TetR mediated auxotrophic off-switch.

Class Information. Introduction to Genome Biology and Microarray Technology. Biostatistics Rafael A. Irizarry. Lecture 1

Nucleic Acids. OpenStax College. 1 DNA and RNA

If you wish to have extra practice with swiss pdb viewer or to familiarize yourself with how to use the program here is a tutorial:

i. Monomers for which class(s) of macromolecules always have phosphorous? Circle all that apply. Carbohydrates Proteins Lipids DNA RNA

Introduction to Bioinformatics

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

Inserting genes into plasmids

Conformational properties of enzymes

Course Competencies Template Form 112

Course Competencies Template Form 112

Homology Modeling of Mouse orphan G-protein coupled receptors Q99MX9 and G2A

Please write your name or student ID number on every page.

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

Using the Genome Browser: A Practical Guide. Travis Saari

CS612 - Algorithms in Bioinformatics

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

CHEM 436 / 630. Molecular modelling of proteins. Winter 2018 Term. Instructor: Guillaume Lamoureux Concordia University, Montréal, Canada

Zool 3200: Cell Biology Exam 3 3/6/15

Two Mark question and Answers

Bioinformatics Take Home Test #1

Transcription:

Name (please print) Protein Bioinformatics PH260.655 Final Exam => take-home questions => open-note => please use either * the Word-file to type your answers or * print out the PDF and hand-write your answers => due: May 20th, 2010, 3:30 pm GOOD LUCK!

In HW#5, you learned about the importance of the Pentose-Phosphate Pathway and one of its key enzymes for Agrobacterium tumefaciens. The other key enzyme of this pathway which was detected highly abundant in the ph 7 mass spectrometry data set is Transketolase. You become curious about the nature of this protein and start to explore it a bit more, e.g. in order to have enough structure-function related information for subsequent cloning / mutagenesis / purification experiments: Please retrieve the protein sequence of Transketolase (TktA) from A. tumefaciens C58 in fasta format from BioCyc. 1) While looking the protein sequence up in BioCyc, you become aware of the gene reaction schematic, and you see that Agrobacterium actually harbors three genes associated with a transketolase function: Tkt, tktb and tkta. They are thought to have originated from gene duplication, but while Tkt and TktB are located on the main, circular chromosome, TktA is located on a second, linear chromosome. Please circle the correct expression below: 1a) tkt and tktb are orthologs paralogs? 1b) tktb and tkta are orthologs paralogs? 2) Now find out about some of TktA's physical properties, choosing an appropriate proteomic tools program, e.g. from the ExPASy server. 2a) Molecular weight: 2b) pi: 2c) Average hydropathy: 2d) Please briefly rank the value you found in 2c by marking one of the words below: hydrophobic amphiphilic hydrophilic

3) Of course, you need to know whether there is a crystal structure known for your protein or at least for a homolog: Search for structures using the PDB s advanced search option for FASTA / BLAST sequences. 3a) Please briefly explain the significance of e-values. 3b) Which e-value cut-off do you have to choose in order to obtain less than 30 results? 4) Alas, there is no Transketolase structure from Agrobacterium in the database. Instead, you decide to have a closer look at one of the E.coli structures with a good e-value, 2R5N, because it contains several ligands. => Please download the fasta sequence associated with 2R5N and the PDB file of 2R5N. 4a) There is 1 polymers and 2 chains reported in the 2R5N structure. What does that mean for TktA's oligomerization state and the number of assigned domains (see Derived Data section)? 4b) Look at the Derived Data section and summarize in general terms or with the help of a schematic what SCOP, CATH and Pfam classifications tell you about the enzyme s overall folding architecture and functional domains. => Please state in simple terms which domains are predicted to be involved in binding the cofactor TPP (= Thiamine pyrophosphate = Thiamine diphosphate = ThDP)? thiazolium ring aminopyrimidine moiety 4c) Derive the annotated domain boundaries from the "Sequence" section (use the SCOP annotation) and list the respective residue numbers and names below:

5) You find one page of an old publication about Transketolase function, showing the multiple sequence alignment below. You vaguely recall that several entirely conserved Histidine residues are important for catalysis, but you cannot look the paper up because the library server is on maintenance till the end of the week. CLUSTAL 2.0.12 multiple sequence alignment 2R5N_Transketolase DAVQKAKSGHPGAPMGMADIAEVLWRDFLKHNPQNPSWADRDRFVLSNGHGSMLIYSLLH 76 Y. pestis DAVQKAKSGHPGAPMGMADIAEVLWRDYLNHNPTNPHWADRDRFVLSNGHGSMLIYSLLH 76 V. P. aeroginosa DAVQKANSGHPGAPMGMADIAEVLWRDYMQHNPSNPQWANRDRFVLSNGHGSMLIYSLLH 76 A. tumefaciens DAVEKANSGHPGLPMGAADVATVLFTRYLKFDPKAPLWADRDRFVLSAGHGSMLLYSLLY 79 M. tuberculosis DAVQKVGNGHPGTAMSLAPLAYTLFQRTMRHDPSDTHWLGRDRFVLSAGHSSLTLYIQLY 120 *.*:*..****.*. * :*.*: :..:*. *.******* **.*: :* *: 2R5N_Transketolase LTGYD-LPMEELKNFRQLHSKTPGHPEVGYTAGVETTTGPLGQGIANAVGMAIAEKTLAA 135 Y. pestis LTGYD-LPMEELKNFRQLHSKTPGHPEYGYTAGVETTTGPLGQGIANAVGFAIAERTLGA 135 V. cholerae LSGYE-LSIDDLKNFRQLHSKTPGHPEYGYAPGIETTTGPLGQGITNAVGMAIAEKALAA 164 P. aeroginosa LTGYD-LGIEDLKNFRQLNSRTPGHPEYGYTAGVETTTGPLGQGIANAVGMALAEKVLAA 135 A. tumefaciens LTGYEDMTIDEIKRFRQFGSKTAGHPEYGHATGIETTTGPLGQGIANAVGMAIAERKLEE 139 M. tuberculosis LGGFG-LELSDIESLRTWGSKTPGHPEFRHTPGVEITTGPLGQGLASAVGMAMASRYERG 179 * *: : :.::: :* *:*.**** ::.*:* ********::.***:*:*.: 2R5N_Transketolase QFN---RPGHDIVDHYTYAFMGDGCMMEGISHEVCSLAGTLKLGKLIAFYDDNGISIDGH 192 Y. pestis QFN---RPGHDIVDHHTYAFMGDGCMMEGISHEVCSLAGTMKLGKLTAFYDDNGISIDGH 192 V. cholerae QFN---KPGHDIVDHFTYVFMGDGCLMEGISHEACSLAGTLGLGKLIAFWDDNGISIDGH 221 P. aeroginosa QFN---RDGHAVVDHYTYAFLGDGCMMEGISHEVASLAGTLRLNKLIAFYDDNGISIDGE 192 A. tumefaciens EF------GSDLQSHFTYVLCGDGCLMEGISHEAIALAGHLKLNKLVLFWDDNNITIDGE 193 M. tuberculosis LFDPDAEPGASPFDHYIYVIASDGDIEEGVTSEASSLAAVQQLGNLIVFYDRNQISIEDD 239 * *.*. *.:.** : **:: *. :**. *.:* *:* * *:*:.. 2R5N_Transketolase VEGWFTDDTAMRFEAYGWHVIRDIDGHDAASIKRAVEEARAVTDKPSLLMCKTIIGFGSP 252 Y. pestis VEGWFTDDTAARFEAYGWHVVRGVDGHNADSIKAAIEEAHKVTDKPSLLMCKTIIGFGSP 252 V. cholerae VEGWFSDDTPKRFEAYGWHVIPAVDGHDADAINAAIEAAKAETSRPTLICTKTIIGFGSP 281 P. aeroginosa VHGWFTDDTPKRFEAYGWQVIRNVDGHDADEIKTAIDTARKS-DQPTLICCKTVIGFGSP 251 A. tumefaciens VGLSDSTDQIARFQAVHWNTIR-VDGHDPDAIAAAIEAAQKS-DRPTFIACKTVIGFGAP 251 M. tuberculosis TNIALCEDTAARYRAYGWHVQEVEGGENVVGIEEAIANAQAVTDRPSFIALRTVIGYPAP 299. * *:.* *:..*.: * *: *:.:*::: :*:**: :* 2R5N_Transketolase NKAGTHDSHGAPLGDAEIALTREQLGWKY-APFEIPSEIYAQWDAKEAGQAK-ESAWNEK 310 Y. pestis NKAGTHDSHGAPLGEAEVAATREALGWKY-PAFEIPQDIYAAWDAKEAGKAK-EAAWNEK 310 V. cholerae NKAGSHDCHGAPLGNDEIKAAREFLGWEH-APFEIPADIYAAWDAKQAGASK-EAAWNEK 339 P. aeroginosa NKQGKEECHGAPLGADEIAATRAALGWEH-APFEIPAQIYAEWDAKETGAAQ-EAEWNKR 309 A. tumefaciens NKQGTHKVHGNPLGAEEIAAARKSLNWEA-EAFVIPEDVLDAWRLAGLRSTKTRQDWEAR 310 M. tuberculosis NLMDTGKAHGAALGDDEVAAVKKIVGFDPDKTFQVREDVLTHTRGLVARGKQAHERWQLE 359 *... **.** *:.: :.:..* : :: :. *:. 2R5N_Transketolase FAAYAKAYPQEAAEFTRRMKGEMPSDFDAKAKEFIAKLQANPAKIASRKASQNAIEAFGP 370 Y. pestis FAAYAKAYPELAAEFKRRVSGELPANWAVESKKFIEQLQANPANIASRKASQNALEAFGK 370 V. cholerae FAAYAKAYPAEAAEYKRRVAGELPANWEAATSEIIANLQANPANIASRKASQNALEAFGK 399 P. aeroginosa FAAYQAAHPELAAELLRRLKGELPADFAEKAAAYVADVANKGETIASRKASQNALNAFGP 369 A. tumefaciens LEATETAK---KAEFKRRFAGDLPGNFDSSIDAFKKKIIENNPTVATRKASEDSLEVING 367 M. tuberculosis FDAWARREPERKALLDRLLAQKLPDGWDADLP----HWEPGSKALATRAASGAVLSALGP 415 : * * *..:*.:. :*:* ** :..:. 2R5N_Transketolase LLPEFLGGSADLAPSNLTLWSGSKAIN----------EDAAGNYIHYGVREFGMTAIANG 420 Y. pestis VLPEFLGGSADLAPSNLTIWSGSKSLS----------DDLAGNYIHYGVREFGMSAIMNG 420 V. cholerae LLPEFMGGSADLAPSNLTMWSGSKSLTA---------EDASGNYIHYGVREFGMTAIING 450 P. aeroginosa LLPELLGGSADLAGSNLTLWKGCKGVSA---------DDAAGNYVFYGVREFGMSAIMNG 420 A. tumefaciens ILPEMVGGSADLTPSNNTKTSQMKSITP---------TDFSGRYLHYGIREHGMAAAMNG 418 M. tuberculosis KLPELWGGSADLAGSNNTTIKGADSFGPPSISTKEYTAHWYGRTLHFGVREHAMGAILSG 475 ***: ******: ** *..... *. :.:*:**..* *.* 2R5N_Transketolase ISLHGGFLPYTSTFLMFVEYARNAVRMAALMKQRQVMVYTHDSIGLGEDGPTHQPVEQVA 480 Y. pestis IALHGGFIPYGATFLMFVEYARNAVRMAALMKIRSVFVYTHDSIGLGEDGPTHQPVEQMA 480 V. cholerae IALHGGFVPYGATFLMFMEYARNAMRMAALMKVQNIQVYTHDSIGLGEDGPTHQPVEQIA 510 P. aeroginosa VALHGGFIPYGATFLIFMEYARNAVRMSALMKQRVLYVFTHDSIGLGEDGPTHQPIEQLA 480 A. tumefaciens IALHGGLIPYAGGFLIFSDYCRPSIRLAALMGIRVVHVLTHDSIGVGEDGPTHQPVEQIA 478 M. tuberculosis IVLHGPTRAYGGTFLQFSDYMRPAVRLAALMDIDTIYVWTHDSIGLGEDGPTHQPIEHLS 535 : ***.*. ** * :* * ::*::*** : * ******:*********:*:::

5a) Explain the symbols underneath the alignment:. * : 5b) List all completely conserved Histidines. Please include their residue numbers corresponding to the 2R5N sequence. Now you would like to visualize your findings in 3D. Please open 2R5N in Jmol. => HINT: It might be helpful to use the "select" commands in the Jmol scrip command line, as shown in the Extra-homework set posted under May 18th. 6) From your answer in 4c, color the functional domains according to the assignment by SCOP. It is okay to color both chains. Please draw a crude schematic of the overall protein architecture (comprising both chains) using oval- or egg-like shapes for each domain. Please indicate inter-domain contacts. example: domain a domain c inter-domain contact domain b 7) Examine the structure more closely and identify the different types of ligands by mousing oven them and comparing them with the Ligand Component section on the PDB structure summary page. => Select all ethanediol molecules and hide everything else. (Hint: Ligands are considered "Hetero"atoms in Jmol...) => How many ethanediol molecules do you see? Where could they have originated from? You might want to change their color to "translucent" and select again all atoms. The important ligands should now be better visible.

Using the center and zoom tools in Jmol, navigate the structure to a position from where you have a good overview of the structural environment of a sugar and a TPP ligand in one of the active sites. 8) Now, you want to distinguish between conserved Histidines in the active site and elsewhere in the structure: => Choose a clear representation to display all conserved His-residues found in 5b and note their residue numbers. => Hint: A good strategy could be again to use the command line selection tools. 8a) Which Histidines are NOT in the active site? 8b) List those Histidines which ARE in the active site and also indicate which domain they are located on. 8c) Focus on the Histidine residue that interacts closely with the phosphate moiety of the sugar and note the His-residue number. 8d) Measure the shortest distance between an atom of this His and the sugar. What is the distance? 8e) What kind of bond is formed between the sugar and this Histidine residue? 9) There is a calcium atom present in the active site. 9a) What atoms are bound to the calcium that are not part of the protein? 9b) Which molecule do these atoms belong to? ===================> PLEASE TRUN OVER <====================

10) For Extra credit: Homology Modelling of TktA from Agrobacerium tumefaciens using Swiss Model. Having dealt in so much detail with a homologous structure from E. coli, you would now like to get an impression of how a 3D-model of the Agrobacterium enzyme might look like. http://swissmodel.expasy.org/ => Automated Mode Submit a modelling request using your email address and A. tumefaciens' TktA protein sequence. You will be notified by email when the process is finished (typically takes 30-60 min.). To retrieve the result, you will need to enter a user code which will be sent to your email address. => What structure did Swiss model choose as a homology modeling template. Please list the PDB code. => Did you encounter this structure code already somewhere in the course of the exam??