Introduction to Bioinformatics Online Course: IBT

Similar documents
11 questions for a total of 120 points

6-Foot Mini Toober Activity

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 5 Protein Structure - III

Create a model to simulate the process by which a protein is produced, and how a mutation can impact a protein s function.

DNA is normally found in pairs, held together by hydrogen bonds between the bases

What is necessary for life?

BIOLOGY. Monday 14 Mar 2016

2018 Protein Modeling Exam Key

What is necessary for life?

Protein Structure Analysis

Fundamentals of Protein Structure

Basic concepts of molecular biology

Connect the dots DNA to DISEASE

Algorithms in Bioinformatics ONE Transcription Translation

STRUCTURE, DYNAMICS AND INTERACTIONS OF PROTEINS BY NMR SPECTROSCOPY


All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

DNA.notebook March 08, DNA Overview

LIST OF ACRONYMS & ABBREVIATIONS

Introduction to Bioinformatics Online Course: IBT

The Transition to Life!

Introduction to Bioinformatics for Computer Scientists. Lecture 2

Unit 1. DNA and the Genome

The Transition to Life

2) Which functional group is least important in biochemistry? A) amine B) ester C) hydroxyl D) aromatic E) amide

Protein Synthesis. Application Based Questions

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Basic concepts of molecular biology

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Protein Synthesis: Transcription and Translation

Bi Lecture 3 Loss-of-function (Ch. 4A) Monday, April 8, 13

Additional Case Study: Amino Acids and Evolution

Key Concept Translation converts an mrna message into a polypeptide, or protein.

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

Granby Transcription and Translation Services plc

Using DNA sequence, distinguish species in the same genus from one another.

SNVBox Basic User Documentation. Downloading the SNVBox database. The SNVBox database and SNVGet python script are available here:

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

EE550 Computational Biology

From mechanism to medicne

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

Human Biology 175 Lecture Notes: Cells. Section 1 Introduction

A self- created drawing, collage, or computer generated picture of your creature.

A 1. How many nitrogen atoms are found in the backbone of each amino acid? A. 1 B. 2 C. 3 D. 4

Deoxyribonucleic Acid DNA. Structure of DNA. Structure of DNA. Nucleotide. Nucleotides 5/13/2013

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

STRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds

Molecular Biology. Biology Review ONE. Protein Factory. Genotype to Phenotype. From DNA to Protein. DNA à RNA à Protein. June 2016

Genomics and Database Mining (HCS 604.3) April 2005

UNIT I RNA AND TYPES R.KAVITHA,M.PHARM LECTURER DEPARTMENT OF PHARMACEUTICS SRM COLLEGE OF PHARMACY KATTANKULATUR

Exploring genomes - A Treasure Hunt on the Internet

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Hmwk # 8 : DNA-Binding Proteins : Part II

Just one nucleotide! Exploring the effects of random single nucleotide mutations

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF BIOTECHNOLOGY Professor Bjørn E. Christensen, Department of Biotechnology

Natural Capital Protocol in Practice Ajinomoto

Artificial Intelligence in Prediction of Secondary Protein Structure Using CB513 Database

Protein Structure and Function! Lecture 4: ph, pka and pi!

Turning λ Cro into a

Textbook Reading Guidelines

BIOLOGY 200 Molecular Biology Students registered for the 9:30AM lecture should NOT attend the 4:30PM lecture.

BIRKBECK COLLEGE (University of London)

Nucleic acid and protein Flow of genetic information

Proteins: Wide range of func2ons. Polypep2des. Amino Acid Monomers

THE GENETIC CODE Figure 1: The genetic code showing the codons and their respective amino acids

Name. Student ID. Midterm 2, Biology 2020, Kropf 2004

Chapter 3-II Protein Structure and Function

The Beery Twins Story and Sepiapterin Reductase

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

Lecture 2 Introduction to Data Formats

Aipotu II: Biochemistry

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

MOLEBIO LAB #3: Electrophoretic Separation of Proteins

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

The study of protein secondary structure and stability at equilibrium ABSTRACT

1. How many nitrogen atoms are found in the backbone of each amino acid? A. 1 B. 2 C. 3 D. 4

PROTEINS & NUCLEIC ACIDS

Biochem Fall Sample Exam I Protein Structure. Vasopressin: CYFQNCPRG Oxytocin: CYIQNCPLG

Protein Structure. Protein Structure Tertiary & Quaternary

What Is A Genome? GENOME

Protein Synthesis Review Bi 12 /25

ENZYMES AND METABOLIC PATHWAYS

Chapter 12-3 RNA & Protein Synthesis Notes From DNA to Protein (DNA RNA Protein)

Designer Genes Tryout Test Carmel Science Olympiad

produces an RNA copy of the coding region of a gene

Steroids. Steroids. Proteins: Wide range of func6ons. lipids characterized by a carbon skeleton consis3ng of four fused rings

arxiv:math/ v1 [math.pr] 24 Apr 2003

Protein Structure Prediction by Constraint Logic Programming

Nucleic Acids, Proteins, and Enzymes

CELL BIOLOGY - CLUTCH CH. 7 - GENE EXPRESSION.

AMINO ACIDS, POLYPEPTIDES AND PROTEINS

Protein Structure, Function, and Folding

Lecture 19A. DNA computing

HASPI Medical Biology Lab 02 Background/Introduction

Aipotu I & II: Genetics & Biochemistry

Chapter 17. Prediction, Engineering, and Design of Protein Structures

Name: TOC#. Data and Observations: Figure 1: Amino Acid Positions in the Hemoglobin of Some Vertebrates

Transcription:

Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec6:Interpreting Your Multiple Sequence Alignment

Interpreting Your Multiple Sequence Alignment

Interpret your multiple sequence alignment The interpretation of a multiple alignment depends very much on its appearance. Some tools on the Net can help you make sense of your multiple alignments by extracting blocks or singling out special positions.

- Interpreting an alignment is a bit of an art. - E-values (the scores that tell you how reliable your database search is )

That means deciding whether your alignment is correct still involves some educated guesswork.

- DNA alignments are by far the most difficult to interpret. - If you re analyzing this type of sequence, you want a very high level of conservation, knowing that single conserved columns are likely to be meaningless.

- A DNA block is only informative when it contains several identical columns in a cluster. - Even with the DNA of closely related sequences, obtaining such an alignment is still difficult. - This is why most biologists prefer protein alignments.

Recognizing the good parts in a protein alignment

- The most convincing evaluative grid we have for a protein multiple alignment stems from our knowledge of protein structures.

- We know that structures contain surface loops that evolve rapidly. (Loops are softer portions of the protein that connect its more rigid portions).

Protein structures also contain core regions that act as support walls for the protein. These support walls evolve less rapidly than the loops on the surface..

In your multiple alignment, you can expect to find nice, gap-free blocks that correspond to the core regions and gap-rich regions that correspond to the loops.

Cabalistic signs The last line contains seemingly ClustalW, MUSCLE, or Tcoffee alignment, cabalistic signs such as (*), (:), or (.). (*) A star indicates an entirely conserved column. (:) A colon indicates columns where all the residues have roughly the same size and the same hydropathy. (.) A period indicates columns where the size OR the hydropathy has been preserved in the course of evolution.

The average good block is: - A unit at least 10 30 amino acids long, exhibiting at least one to three stars (*), a few more colons (:) close to the stars, and a several periods (.) scattered along the MSA result.

The magic thing about multiple sequence alignments is that 4 or 5 conserved positions over 50 amino acids can be enough to convince us that we re looking at a genuine signal. This is less than 10 percent identity! You have to remember that we require at least 25 percent identity to consider a pairwise alignment

Conserved columns in a multiple sequence alignment are meaningful only when the surrounding columns are not conserved

Another criterion for a useful multiple alignment is knowing the type of amino acids you can expect to see conserved.

Amino acids aren t equal and they all have very characteristic patterns of mutation/conservation in a multiple sequence alignment.

Patterns of Conservation in Multiple Sequence Alignments

W(tryptophans),F(phenylalanine), Y(tyrosine) It is common to find conserved tryptophans. Tryptophan is a large hydrophobic residue that sits deep in the core of proteins. It plays an important role in their stability and is therefore difficult to mutate. When tryptophan mutates, it is usually replaced by another aromatic amino acid, such as phenylalanine or tyrosine. Patterns of conserved aromatic amino acids constitute the most common signatures for recognizing protein domains.

G (glycine), P (proline) It is common to find conserved columns with a glycine or a proline in a multiple alignment. These two amino acids often coincide with the extremities of well-structured beta strands or alpha helices.

C (cysteines) Cysteines are famous for making C-C (disulphide) bridges. Conserved columns of cysteines are rather common and usually indicate such bridges. Columns of conserved cysteines with a specific distance provide a useful signature for recognizing protein domains and folds.

H(Histidine), S(serine) Histidine and serine are often involved in catalytic sites, especially those of proteases. Conserved histidine or a conserved serine are good candidates for being part of an active site.

K (Lysine), R (Arginine), D (Aspartic Acid), E (Glutamic Acid) These charged amino acids are often involved in ligand binding. Highly conserved columns can also indicate a salt bridge inside the core of the protein.

L (Leucines) Leucines are rarely very conserved unless they re involved in protein-protein interactions such as a leucine zipper.

Department of Genetics, Zagazig University, Zagazig, Egypt