Towards definition of an ECM parts list: An advance on GO categories

Similar documents
Extracellular Matrix Biology

Dual Roles for PARP1 during Heat Shock: Transcriptional Activator and Posttranscriptional Inhibitor of Gene Expression

Old EXAM 1 BIO409/509 NAME. Please number your answers and write them on the attached, lined paper.

Exam MOL3007 Functional Genomics

Preface to the Series

CRISPR GENOMIC SERVICES PRODUCT CATALOG

Textbook Reading Guidelines

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

(Length: 16 min; file size: 9.4 MB; file format: mp3; location:

Cell-Extracellular Matrix Interactions

LIFE Formation III. Morphogenesis: building 3D structures START FUTURE. How-to 2 PROBLEMS FORMATION SYSTEMS SYSTEMS.

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

cellular interactions

Introduction to 'Omics and Bioinformatics

Motivation From Protein to Gene

1 Scaffolds. 2 Extracellular matrix

Protein Expression In Mammalian Cells: Methods And Protocols (Methods In Molecular Biology)

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics

Glycomics/Glycobiology

Practice Exam A. Briefly describe how IL-25 treatment might be able to help this responder subgroup of liver cancer patients.

Protein-Protein Interactions: A Molecular Cloning Manual READ ONLINE

PCR Arrays. An Advanced Real-time PCR Technology to Empower Your Pathway Analysis

Reproductive Biology and Endocrinology BioMed Central Review Smad signalling in the ovary Noora Kaivo-oja, Luke A Jeffery, Olli Ritvos and David G Mot

8,003,386 Tumor necrosis factor receptors 6.alpha. and 6.beta. 7,851,596 Myeloid progenitor inhibitory factor- 1 (MPIF- 1) polypeptides

Personalized CAR-T Immunotherapy Platform

The Extracellular Matrix: Not Just Pretty Fibrils

BIOL 461/ 661 Cell Biology 4 Credits Instructor: Dr. Kristin O Brien. T/TH 11:30-1:00, Irving I 208 Office hours: M 9-10, TH 3-4

BIOINFORMATICS Introduction

Time allowed: 2 hours Answer ALL questions in Section A, ALL PARTS of the question in Section B and ONE question from Section C.

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

ksierzputowska.com Research Title: Using novel TALEN technology to engineer precise mutations in the genome of D. melanogaster

Introduction to Genome Biology

GREG GIBSON SPENCER V. MUSE

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Chemical Biology In-class discussion: Tethering

From Proteomics to Systems Biology. Integration of omics - information

Proteomics and some of its Mass Spectrometric Applications

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Hector R. Wong, MD Division of Critical Care Medicine Cincinnati Children s Hospital Medical Center Cincinnati Children s Research Foundation

Functional profiling of metagenomic short reads: How complex are complex microbial communities?

Finding Genes with Genomics Technologies

Outline and learning objectives. From Proteomics to Systems Biology. Integration of omics - information

Lecture 3 MOLECULAR REGULATION OF DEVELOPMENT

7.06 Cell Biology QUIZ #3

NAME TA SEC Problem Set 5 FRIDAY October 29, 2004

Lab Module 7: Cell Adhesion

Communicating Science:

MICB688L/MOCB639 Advanced Cell Biology Exam II

Cell biology.

A History of Bioinformatics: Development of in silico Approaches to Evaluate Food Proteins

Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm

Contents. The Right Surface for Every Cell Extracellular Matrices and Biologically Coated Surfaces ECM Mimetic and Advanced Surfaces...

Department of Systems Biotechnology

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

27041, Week 02. Review of Week 01

Monoclonal antibody to fibronectin which inhibits. Extracellular matrix assembly

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Activity-Based Proteomics Protein and Ligand Discovery on a Global Scale

3D Hyaluronan Hydrogels as Matrices for Spheroid Formation

Post-translational modification

Bioinformation by Biomedical Informatics Publishing Group

BIOINFORMATICS IN AQUACULTURE. Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007

Understanding the Cellular Mechanism of the Excess Microsporocytes I (EMSI) Gene. Andrew ElBardissi, The Pennsylvania State University

Quo vadis Drug Discovery

Introduction to Proteins

The Integrated Biomedical Sciences Graduate Program

Mammalian EGF receptor activation by the rhomboid protease RHBDL2

Cyber DNA Extraction and Gel Electrophoresis

This Technical Note describes these characterization studies in more detail.

Zool 3200: Cell Biology Exam 3 3/6/15

Large-Scale Single Guide RNA Library Construction and Use for CRISPR Cas9-Based Genetic Screens

Investing in Discovery

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

Isolation and Characterization of the Porcine Tissue Kallikrein Gene Family

Applications of HMMs in Epigenomics

Chapter 15 Gene Technologies and Human Applications

Supplementary Materials: Alternative Splicing in Adhesion- and Motility-Related Genes in Breast Cancer

Inserting genes into plasmids

Data analysis: YeastMine, GO tools, and use cases

OriGene GFC-Arrays for High-throughput Overexpression Screening of Human Gene Phenotypes

Lecture 4: Degradable Materials with Biological Recognition (part II)

Development of new polymeric biomaterials for in vitro and in vivo liver reconstruction

Basics of 3D Cell Culture: Forming spheroids in the presence of extracellular matrix

File S1. Program overview and features

CELL BIOLOGY BIOL3030, 3 credits Fall 2012, Aug 20, Dec 14, 2012 Tuesday/Thursday 8:00 am-9:15 am Bowman-Oddy Laboratories 1049

Genome annotation & EST

List of Supervisors and Projects for Summer Research Program 2018

CHAPTER 21 LECTURE SLIDES

Understanding protein lists from comparative proteomics studies

Annotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station

3'UTR elements inhibit Ras-induced C/EBPß posttranslational activation and senescence in tumor cells

Medicinal Chemistry of Modern Antibiotics

Medicinal Chemistry of Modern Antibiotics

EUROPEAN PATENT OFFICE U.S. PATENT AND TRADEMARK OFFICE CPC NOTICE OF CHANGES 558 DATE: AUGUST 1, 2018 PROJECT RP0075 A61K 38/1764

Talk with the editors of THE JOURNAL OF BIOLOGICAL CHEMISTRY

Genomics and Proteomics *

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

z-movi High-throughput Label-free Cell Interaction Studies z-movi I Application & Product Brochure

ESSENTIAL BIOINFORMATICS

Transcription:

Towards definition of an ECM parts list: An advance on GO categories The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Naba, Alexandra, Sebastian Hoersch, and Richard O. Hynes. Towards Definition of an ECM Parts List: An Advance on GO Categories. Matrix Biology 31, no. 7-8 (September 2012): 371 372. http://dx.doi.org/10.1016/j.matbio.2012.11.008 Elsevier B.V. Version Author's final manuscript Accessed Sun Nov 26 03:27:19 EST 2017 Citable Link Terms of Use Detailed Terms http://hdl.handle.net/1721.1/106801 Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/

Towards Definition of an ECM Parts List: an Advance on GO Categories Alexandra Naba, Sebastian Hoersch and Richard O. Hynes Howard Hughes Medical Institute and Bioinformatics and Computing Facility Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology Cambridge, Massachusetts 02139, USA. Those of us interested in the extracellular matrix (ECM) are faced with significant challenges of definition. ECM proteins are large, complex and assembled into crosslinked insoluble matrices. This has meant that defining the biochemical composition of ECMs has been difficult. Nonetheless, protein chemistry and molecular biology have defined many familiar ECM proteins collagens, proteoglycans, laminins, thrombospondins, tenascins, fibronectins, etc. With the completion of many genomes it should now be possible to develop complete parts lists for the ECM. Such lists are needed for analyzing data from omic approaches such as expression arrays, latest-generation sequencing and proteomics. These approaches generate long lists and it is typically necessary to extract from those lists the genes/proteins of interest. Anyone who attempts to do this using the commonly used gene ontology (GO) categories soon discovers that they are largely useless for defining ECM proteins. Many ECM proteins are unannotated and those which are, are sorted, with little evidence of logic or consistency, into diverse categories such extracellular matrix, basement membrane, cell surface and many others. The human and mouse orthologs are often found in different categories and attempts to use GO categories to extract a complete list of ECM genes or proteins from a data set are unsatisfactory at best. Faced with this problem in the course of an ECM proteomics project, we decided we needed to develop a better list of ECM proteins (Naba et al., 2012). This turned out to be not entirely straightforward. While the familiar ECM glycoproteins, collagens and proteoglycans could be collected relatively easily, the standard procedures for collecting lists of homologs, using BLAST or domain-based searches, quickly run into problems. ECM proteins are characteristically formed from multiple domains and those domains are shared among different ECM proteins and also with non-ecm proteins (Hohenester and Engel, 2002; Adams and Engel, 2007). Two obvious examples among many are EGF and FN3 domains, both very ancient 1

protein domains that predate the origins of extracellular matrix in metazoa. They are found in many ECM proteins but also in diverse membrane proteins and secreted factors. So a BLAST or domain search produces a confusing mix of homologs. It is the domain architecture (the entire domain composition, number and order) that defines families of ECM proteins. Nonetheless, the domain composition of ECM proteins does offer the route to defining an essentially complete list of ECM proteins and sorting them from homologous but non-ecm proteins (i.e., those which share one or more domains with ECM proteins but are clearly not matrix proteins examples would include many tyrosine kinase or phosphatase receptors or adhesion receptors). We started with a list of characteristic ECM domains and used them to pull out all genes/proteins containing those domains from the human and mouse genomes/proteomes. We then culled those lists using a list of excluding domains to remove contaminants the excluding list contained domains such as kinase, phosphatase and protease, chosen to eliminate the obvious contaminants. We performed this positive/negative sweep procedure iteratively, checking to ensure that we collected all the ECM proteins we could think of (that meant adding and deleting some domains) and eliminated all the contaminants. The details are given in Naba et al. (2012). We ended up with a list of around 50 including domains and around 20 excluding domains that effectively collected all known (at least to us) ECM proteins and did not select most growth factor and adhesion receptors. We then screened out any proteins with transmembrane domains, with the exception of a few collagens. The eventual lists (human and mouse) each of around 300 proteins we called the core matrisome (Figure 1). These lists included all the known collagens, proteoglycans and well defined ECM glycoproteins along with additional proteins that had all the structural (domain) characteristics of ECM proteins but about which essentially nothing is known - presumptive novel ECM proteins. However, another question of definition arose immediately. How do we define extracellular matrix? Does it include bound growth factors or bound ECM-modifying enzymes? And what about protein families such as mucins, galectins and semaphorins or proteins containing short stretches of collagen triple helix (C1q, collectins, ficolins, acetylcholine esterase)? Operationally, many of these proteins fractionate/co-purify with extracellular matrix and certainly contribute to its biological functions. Scientists may differ on whether or not to include such protein families (and others) in the definition of ECM proteins. So we decided to develop 2

some additional matrisome-associated categories, using similar strategies of including and excluding domains. Specifically we defined a list of secreted factors known growth factors and their homologs (TGF-β, BMPs, PDGFs, FGFs, Wnts, Hedgehogs, S100 proteins, chemokines etc.). We were deliberately inclusive and did not restrict these lists to growth factors, cytokines and chemokines known to bind to ECM proteins but also all their homologs as well as other families of secreted factors that might bind to ECM. Similarly we developed a list of ECM regulators (MMPs and other proteases, including membrane-bound proteases such as ADAMs and ADAM-TS proteins as well as other protease families and protease inhibitors) and ECM crosslinking enzymes (transglutaminases, lysyl oxidases and prolyl hydroxylases and regulators of these modifiers). Finally, we developed a list of ECM-affiliated proteins to include proteins that some might consider ECM-associated, whereas others would not. This list includes mucins, C-type lectins, semaphorins, syndecans, glypicans, as well as some protein families that we included because some members of the family co-enriched with genuine ECM proteins in our experiments (annexins, galectins). We believe that the core matrisome categories (collagens, proteoglycans and ECM glycoproteins) are robust and not likely to change much with further analyses, at least for mammals and probably other vertebrates (other taxonomic groups clearly do contain additional ECM proteins). However, the matrisome-associated categories (secreted factors, regulators and affiliated proteins) are, by their nature, less firmly established and we suspect that they may well evolve in light of subsequent analyses. These latter categories were deliberately inclusive although many proteins within those categories undoubtedly do bind reproducibly to ECM, others may not (see Figure 1). Our aim was to define categories that would capture all candidate components of the ECM. Having generated these categorical lists (see Naba et al., 2012; Hynes and Naba, 2012 and http://web.mit.edu/hyneslab/matrisome/) we would like to suggest that these six subcategories offer a much more workable way to select out sub-lists of ECM proteins and ECM-associated proteins from data sets, be they derived from proteomics, genomics, expression profiling or any other genome-scale analyses. We believe these lists will be straightforward to use. We expect them to evolve and welcome suggestions for additions or other modifications. It is our intention to maintain the matrisome web site and incorporate additional information in due course. 3

Figure 1. The human matrisome and its subcategories. The core matrisome comprises three subcategories; ECM glycoproteins, collagens and proteoglycans, in each case defined by their domain structures (see text). All of these proteins are believed to assemble into extracellular matrices of one sort or another. The three main subcategories of matrisome-associated proteins are more inclusively defined they include proteins known to associate with assembled ECM as well as related proteins that may or may not all were included to ensure their capture in omic screens of various sorts. The secreted protein category includes a list of growth factors, cytokines and other secreted proteins some are known to bind to ECM at least part of the time; others are included in the expectation that many of them will also be discovered to bind to ECM. The ECM regulator category includes proteases, protease inhibitors and ECM crosslinking enzymes. Again, many are known to bind to and modify ECM proteins and structures in important ways their homologs are likely also to do so and have been included for completeness. The final category, designated ECM-affiliated includes protein families that some scientists (but not others) may consider as ECM proteins (e.g., mucins, C-type lectins, syndecans, glypicans), some that could be viewed as secreted factors but which also associate with solidphase complexes (e.g., semaphorins and their homologous receptors, plexins, collagen-related proteins such as C1q and homologs) and a few families that appear repeatedly in ECM-enriched preparations for currently unknown reasons (e.g., annexins, galectins). Complete lists of the proteins in each subcategory for both human and mouse, together with gene and protein identifiers as well as protein sequence files are given in Naba et al. (2012) and at ; http://web.mit.edu/hyneslab/matrisome/ and summary tables are given in Hynes and Naba, 2012. References cited Adams J.C. and Engel J. (2007) Bioinformatic analysis of adhesion proteins. Methods Mol. Biol. 370, 147-172. Hohenester E. and Engel J. (2002) Domain structure and organisation in extracellular matrix proteins. Matrix Biol. 21, 115-28. Hynes, R.O. and Naba, A. (2012). Overview of the Matrisome an inventory of extracellular matrix constituents and functions. In Extracellular Matrix Biology (eds. RO Hynes and KM Yamada). Cold Spring Harbor Perspectives in Biology. doi: 10.1101/cshperspect.a004903. Naba, A., Clauser, K.R., Hoersch, S., Liu, H., Carr, S.A. and Hynes, R.O. (2012). The matrisome: in silico definition and in vivo characterization by proteomics of normal and tumor extracellular matrices. Mol. Cell Proteomics. 11(4):M111.014647. 4

5