Title: A topological and conformational stability alphabet for multi-pass membrane proteins

Similar documents
Assessing a novel approach for predicting local 3D protein structures from sequence

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

Technical Seminar 22th Jan 2013 DNA Origami

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Suppl. Figure 1: RCC1 sequence and sequence alignments. (a) Amino acid

SUPPLEMENTARY INFORMATION

A Protein Secondary Structure Prediction Method Based on BP Neural Network Ru-xi YIN, Li-zhen LIU*, Wei SONG, Xin-lei ZHAO and Chao DU

Structural bioinformatics

Textbook Reading Guidelines

Molecular design principles underlying β-strand swapping. in the adhesive dimerization of cadherins

CFSSP: Chou and Fasman Secondary Structure Prediction server

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

BCH222 - Greek Key β Barrels

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Protein Structure Prediction

Visualization of codon-dependent conformational rearrangements during translation termination

Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

Nature Biotechnology: doi: /nbt Supplementary Figure 1

Docking. Why? Docking : finding the binding orientation of two molecules with known structures

Representation in Supervised Machine Learning Application to Biological Problems

Protein Structure. Protein Structure Tertiary & Quaternary

BIOINFORMATICS Introduction

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Structural Bioinformatics (C3210) DNA and RNA Structure

BIRKBECK COLLEGE (University of London)

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Figure S1

Predicting Protein Structure and Examining Similarities of Protein Structure by Spectral Analysis Techniques

Supplementary Information. Structural basis for duplex RNA recognition and cleavage by A.

RNA does not adopt the classic B-DNA helix conformation when it forms a self-complementary double helix

What s New in Discovery Studio 2.5.5

Supplementary Fig.1. Accuracy of LIPS predicted TMH binding surfaces.

COMPAS for the Analysis of SELEX Experiments

Name with Last Name, First: BIOE111: Functional Biomaterial Development and Characterization MIDTERM EXAM (October 7, 2010) 93 TOTAL POINTS

SUPPLEMENTARY INFORMATION

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Protein Structural Motifs Search in Protein Data Base

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

STRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds

Supplementary Fig. 1. Initial electron density maps for the NOX-D20:mC5a complex obtained after SAD-phasing. (a) Initial experimental electron

Supplementary Material. for. A Tool Set to Map Allosteric Networks through the NMR Chemical Shift Covariance Analysis

Sequence Determinants of a Conformational Switch in a Protein

Protein structure. Wednesday, October 4, 2006

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

Exploring Similarities of Conserved Domains/Motifs

M1 - Biochemistry. Nucleic Acid Structure II/Transcription I

Bacteriophages get a foothold on their prey

Supplementary Figure 1.

J. DISTRIBUTION STATEMENT A Approved for Public Release. a. vm-mmm OHGANüATIOH. Reset

Protein Structure Analysis

CASP 13 Assembly assessment

Molecular Biology Services. Make Research Easy

Fundamentals of Biochemistry

Molecules VII. Structures and Functions of Proteins

Newsletter Issue 7 One-STrEP Analysis of Protein:Protein-Interactions

Prot-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures

RNP purification, components and activity.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Gene Identification in silico

Chapter 14 Regulation of Transcription

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

A New Database of Genetic and. Molecular Pathways. Minoru Kanehisa. sequencing projects have been. Mbp) and for several bacteria including

Protein Domain Boundary Prediction from Residue Sequence Alone using Bayesian Neural Networks

Nature Methods: doi: /nmeth Supplementary Figure 1. Construction of a sensitive TetR mediated auxotrophic off-switch.

X-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction

6-Foot Mini Toober Activity

SUPPLEMENTARY INFORMATION Figures. Supplementary Figure 1 a. Page 1 of 30. Nature Chemical Biology: doi: /nchembio.2528

Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

STRUCTURE, DYNAMICS AND INTERACTIONS OF PROTEINS BY NMR SPECTROSCOPY

Nagahama Institute of Bio-Science and Technology. National Institute of Genetics and SOKENDAI Nagahama Institute of Bio-Science and Technology

Supplementary Table 1: Antigenic regions/sites on Ebola-GP identified using GFPDL*

Textbook Reading Guidelines

Distributions of Beta Sheets in Proteins with Application to Structure Prediction

Introduction to Proteins

Protein Structure Prediction. christian studer , EPFL

Biochem Fall Sample Exam I Protein Structure. Vasopressin: CYFQNCPRG Oxytocin: CYIQNCPLG

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

Supplementary Figures Supplementary Figure 1

Molecular Cell Biology - Problem Drill 06: Genes and Chromosomes

Supplementary Information

Collagen. 7.88J Protein Folding. Prof. David Gossard October 20, 2003

Homology Modeling of the Chimeric Human Sweet Taste Receptors Using Multi Templates

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M.

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE

Identification and characterization of DNA aptamers specific for

RPA-AB RPA-C Supplemental Figure S1: SDS-PAGE stained with Coomassie Blue after protein purification.

Structural Bioinformatics GENOME 541 Spring 2018

Supplemental Information Molecular Cell, Volume 41

BMB/Bi/Ch 170 Fall 2017 Problem Set 1: Proteins I

C2H2 zinc finger proteins greatly expand the human regulatory lexicon

RNA folding on the 3D triangular lattice. Joel Gillespie, Martin Mayne and Minghui Jiang

Immune system IgGs. Carla Cortinas, Eva Espigulé, Guillem Lopez-Grado, Margalida Roig, Valentina Salas. Group 2

BIOLOGY 200 Molecular Biology Students registered for the 9:30AM lecture should NOT attend the 4:30PM lecture.

Unit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression

Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction networks

" Rational drug design; a Structural Genomics approach"

A data-driven protein-structure prediction algorithm

Transcription:

Supplementary Information Title: A topological and conformational stability alphabet for multi-pass membrane proteins Authors: Feng, X. 1 & Barth, P. 1,2,3 Correspondences should be addressed to: P.B. (patrickb@bcm.edu) 1 Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA. 2 Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA. 3 Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.

Supplementary Results Supplementary Figure 1. A large majority of TMHs in multi-pass membrane proteins form binding interfaces with two other helices. Venn Diagram describing the fractions of TMHs in multi-pass membrane proteins: 1) interacting with two other helices, 2) interacting with two other helices with more than one residue forming contacts with the 2 helices simultaneously, 3) interacting with two other helices with more than five residues forming contacts with the 2 helices simultaneously. Contacts are defined by residues pairs with Cα distance less than 9Å.

Supplementary Figure 2. Uncovering universal sequence/structure principles governing multi-pass membrane protein topology and structure flexibility. a. The panel describes the process of identifying consensus sequence/structure motifs from the protein structure database. Elementary interacting TMH trimer units are extracted from unrelated protein structures, clustered into structurally similar families. Combinations of residues enriched within each trimer family creating consensus networks of stabilizing interhelical contacts are identified. b. The panel describes the stringent validation of the sequence/structure determinants of TMH trimer packing. If the sequence motifs are strong determinants of trimer conformations, prediction of trimer topology from sequence should be feasible. If motifs and associated interhelical contacts are strong determinants of trimer stability, they should guide the prediction of local conformational flexibility in TM proteins.

Supplementary Figure 3. Pi chart describing the clustering of TMH trimers into structurally similar families based on Cα RMSD. 56% of the trimer unit library can be classified into only 6 well-defined structural classes. The topology and percentage of each trimer type is labeled in the chart.

Supplementary Table 1. The geometry of the 6 most populated trimer structure clusters. The table describes the geometrical features of the 3 helical pairs constituting each trimer type. Helices are numbered (second column) according to a reference trimer structure in each class to assign helix pairs with specific topology. The same helix numbers are used to assign enriched motifs to specific helices in main text Figure 1 and Supplementary Figure 2.

Supplementary Table 2: Table of enriched sequence motifs at TMH trimer interfaces. For each trimer structure class (i.e. cluster), the p-value of enrichment of a motif on a specific helix is calculated using TMSTAT (see methods) and reported for two homology reduction thresholds: 60% sequence identity (60SI) removing close homologs for the structural analysis and one protein per superfamily (1/superfamily), a stringent homology reduction for training and testing the topology predictor (see methods). Motifs are classified as major if they form consensus contacts at trimer interfaces and are found in at least 3 protein families and superfamilies (see methods). P-values highlighted in red correspond to motifs that loose significant enrichment (pvalue 0.05) in the dataset composed of only one protein per superfamily.

Supplementary Figure 4. Proteins from the same superfamily display different sequence/structure determinants of the same trimer topology. Four examples of trimers from the all-left handed trimer structure class. a,b. Trimers are from two proteins of the Cytochrome c oxidase subunit I-like superfamily. These proteins share more than 30% sequence identity but bear distinct enriched sequence motif at the same trimer interface. c,d. Trimers are from two proteins of the MFS general substrate transporter superfamily but bear distinct enriched sequence motif at the same trimer interface. Trimers across protein superfamilies (a,c and b,d) share the same motifs.

Supplementary Table 3: Summary of consensus contact map and number of interacting helices for each trimer motif. The table describes the number of consensus contacts and the number of helices interacting with each position of the motif. A contact between one position of a motif and a residue on adjacent helices is defined as consensus if it is found in more than 50% of the trimers from the same structure class sharing the same sequence motif. For comparison, the average number of contacts established by the same position of the motif in all trimers sharing the same sequence motif is provided in parentheses. * Threonine is also observed at both positions. + Methionine is also observed in the second position, # Histidine is also observed in the first position.

Supplementary Table 4. Support Vector Classification (SVC) of trimer sequences into structures. The table reports the accuracy in correctly assigning trimer sequences into one of two possible trimer structure classes. The accuracy of classification is calculated using 5-fold cross validation (see method). Results are given for all possible 15 pairs of trimer structure classes. Random assignment would correspond to 50% accuracy. Below the table, the average accuracy for two-classes assignment is reported. * Average accuracy for two classes assignment is significantly higher than random assignment with P-value < 0.0001 using Welch s t-test. The accuracy in correctly assigning trimer sequences into one of six possible trimer structure classes (6-classes assignment) is reported below the table. Random assignment would correspond to 16.7% accuracy.

Supplementary Table 5. Trimers bearing sequence/contact motifs are less flexible. The left part of the Supplementary Table displays the name of the proteins and the PDB codes of different protein conformations (states) that differ by at least 0.5Å in Cα RMSD. The right part of the table indicates the extent of conformational changes of each trimer unit in the protein measured by Cα RMSD (Å). The trimer unit is defined by the trimer type (i.e. topology) and motif type. The topology is encoded as follows: AL: All left-handed Trimer; AR, All Right-handed Trimer; PL: Parallel & Left handed Trimer; PR: Parallel & Right handed Trimer; LR1: Left & Right Type 1 Trimer; LR2: Left & Right Type 2 Trimer. Motif types are encoded as follows: NA: no sequence/structure motif found in the trimer; (AL, M1): I/L/V/M-X 3 -G/A/S; (AL, M2): I/L/V/M- X 6 -I/L/V/M; (AL, M3): G/A/S-X 6 -G/A/S; (AL, M4): I/L/V/M-X 6 -F/W/Y; (AR, M1): G/A/S-X 3 -G/A/S; (AR, M2): G/A/S-X 7 -G/A/S; (PR, M1): I/L/V/M-X 3 -F/W/Y; (PR, M2): F/W/Y-X 2 -F/W/Y; (LR1, M1): F/W/Y-X 3 -G/A/S; (LR1, M2): F/W/Y-X 6 -G/A/S; (LR2, M1): F/W/Y-X 6 -G/A/S.

Supplementary Table 6. Trimers extracted only from distinct protein superfamilies bearing sequence/contact motifs are less flexible. Compared to Main Text Figure 6 and Supplementary Table 5, only one protein per superfamily was selected leading to 10 protein structures and 40 trimer units. The left part of the Supplementary Table displays the name of the proteins and the PDB codes of different protein conformations (states) that differ by at least 0.5Å in Cα RMSD. The right part of the table indicates the extent of conformational changes of each trimer unit in the protein measured by Cα RMSD (Å). The trimer unit is defined by the trimer type (i.e. topology) and motif type. The topology is encoded as follows: AL: All left-handed Trimer; AR, All Right-handed Trimer; PL: Parallel & Left handed Trimer; PR: Parallel & Right handed Trimer; LR1: Left & Right Type 1 Trimer; LR2: Left & Right Type 2 Trimer. Motif types are encoded as follows: NA: no sequence/structure motif found in the trimer; (AL, M1): I/L/V/M- X 3 -G/A/S; (AL, M2): I/L/V/M-X 6 -I/L/V/M; (AL, M3): G/A/S-X 6 -G/A/S; (AR, M1): G/A/S-X 3 -G/A/S; (AR, M2): G/A/S-X 7 -G/A/S; (PR, M1): I/L/V/M-X 3 -F/W/Y; (PR, M2): F/W/Y-X 2 -F/W/Y; (LR1, M1): F/W/Y-X 3 -G/A/S; (LR1, M2): F/W/Y-X 6 -G/A/S; (LR2, M1): F/W/Y-X 6 -G/A/S.

Supplementary Figure 5. Sequence/3D contact motifs are strong predictors of local conformational stability. Distribution of trimer unit structural changes (measured by Cα rmsd in Å) in multi-pass membrane proteins crystallized in distinct conformations. Only one protein per superfamily was selected leading to 10 protein structures and 40 trimer units. Data for trimers containing enriched sequence/contact motifs are in black, others are in grey. Trimers with sequence motifs and corresponding interaction patterns had substantially smaller Cα rmsd (P < 0.0005, Welch s t-test) between distinct protein conformations and were therefore significantly more rigid than the trimers without such sequence/3d contact features.