MIAPE: Mass Spectrometry Informatics

Similar documents
MIAPE: Column Chromatography

Center for Mass Spectrometry and Proteomics Phone (612) (612)

N- The rank of the specified protein relative to all other proteins in the list of detected proteins.

Proteomics software at MSI. Pratik Jagtap Minnesota Supercomputing institute

Quantitative Analysis on the Public Protein Prospector Web Site. Introduction

How to view Results with. Proteomics Shared Resource

How to view Results with Scaffold. Proteomics Shared Resource

Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs

About OMICS Group Conferences

Current state of proteomics standardization and (C-)HPP data quality guidelines

LC/MS/MS Solutions for Biomarker Discovery QSTAR. Elite Hybrid LC/MS/MS System. More performance, more reliability, more answers

Proteomics And Cancer Biomarker Discovery. Dr. Zahid Khan Institute of chemical Sciences (ICS) University of Peshawar. Overview. Cancer.

Bioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?

LTQ Orbitrap XL Hybrid FT Mass Spectrometer Unrivaled Performance and Flexibility

Tony Mire-Sluis Vice President, Corporate, Product and Device Quality Amgen Inc

faster, confident more path to all the information in your samples. Agilent MassHunter Workstation Software Our measure is your success.

Proteomics: A Challenge for Technology and Information Science. What is proteomics?

PEAKS 8 User Manual. PEAKS Team

Liver Mitochondria Proteomics Employing High-Resolution MS Technology

Enabling Systems Biology Driven Proteome Wide Quantitation of Mycobacterium Tuberculosis

Proteins. Patrick Boyce Biopharmaceutical Marketing Manager Waters Corporation 1

Web based Bioinformatics Applications in Proteomics. Genbank

Maximizing Chromatographic Resolution of Peptide Maps using UPLC with Tandem Columns

ipep User s Guide Proteomics

QUANLYNX APPLICATION MANAGER

Protein De novo Sequencing

The Effect of MS/MS Fragment Ion Mass Accuracy on Peptide Identification in Shotgun Proteomics

MBios 478: Mass Spectrometry Applications [Dr. Wyrick] Slide #1. Lecture 25: Mass Spectrometry Applications

High Resolution Accurate Mass Peptide Quantitation on Thermo Scientific Q Exactive Mass Spectrometers. The world leader in serving science

A Complete Workflow Solution for Intact Monoclonal Antibody Characterization Using a New High-Performance Benchtop Quadrupole- Orbitrap LC-MS/MS

Využití cílené proteomiky pro kontrolu falšování potravin: identifikace peptidových markerů v mase pomocí LC- Q Exactive MS/MS

Modification Site Localization Scoring Integrated into a Search Engine

Proteomics. Proteomics is the study of all proteins within organism. Challenges

AN ACCURATE AND EFFICIENT ALGORITHM FOR PEPTIDE AND PTM IDENTIFICATION BY TANDEM MASS SPECTROMETRY

Algorithm for Matching Additional Spectra

Utilizing novel technology for the analysis of therapeutic antibodies and host cell protein contamination

TurboMass GC/MS Software

Thermo Scientific Peptide Mapping Workflows. Upgrade Your Maps. Fast, confident and more reliable peptide mapping.

Fast and Efficient Peptide Mapping of a Monoclonal Antibody (mab): UHPLC Performance with Superficially Porous Particles

AB SCIEX TripleTOF 5600 SYSTEM. High-resolution quant and qual

Lebedev A, 1 Damoc E, 2 Makarov A, 2 Samguina T 1 1. Moscow State University, Moscow, Russia; 2 ThermoFisher Scientific, Bremen, Germany

A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation

Proudly serving laboratories worldwide since 1979 CALL for Refurbished & Certified Lab Equipment QTRAP 5500

MultiQuant Software Version 3.0

Identification of Microprotein-Protein Interactions via APEX Tagging

The Waters strategy for the quantification of proteins

Fast Multi-blind Modification Search through Tandem Mass Spectrometry* S

Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0

ICP Expert software. Technical Overview. Introduction

Measurement & Analytics Measurement made easy. Advanced spectroscopy software for quantitative and qualitative analysis Horizon MB FTIR

Database Searching for Protein Identification and Characterization

Rapidly Characterize Antibody- Drug Conjugates and Derive Drug-to-Antibody Ratios Using LC/MS

rapiflex Innovation with Integrity Designed for Molecules that Matter. MALDI TOF/TOF

Application Note. Authors. Abstract

Monoclonal Antibody Characterization on Q Exactive and Oribtrap Elite. Yi Zhang, Ph.D Senior Proteomic Marketing Specialist Oct.

1638 Molecular & Cellular Proteomics 6.9

Developing a natural partnership for rapid, sensitive identification of binding partners

Analysis of Glyphosate and AMPA in Environmental Water Samples by Ion Chromatography and On-Line Mass Spectrometry with Minimal Sample Preparation

Xevo G2-S QTof and TransOmics: A Multi-Omics System for the Differential LC/MS Analysis of Proteins, Metabolites, and Lipids

ENCODE DCC Antibody Validation Document

Advances in analytical biochemistry and systems biology: Proteomics

TechnicalNOTE METABOLYNX EXACT MASS DEFECT FILTER: RAPIDLY DISCRIMINATE BETWEEN TRUE METABOLITES AND FALSE POSITIVES INTRODUCTION

Version of This Document The current version of this document is: version 1.0, release candidate 5, 20 June 2014.

MasterView Software. Turn data into answers reliable data processing made easy ONE TOUCH PRODUCTIVITY

DeNovoGUI: an open source graphical user interface for. de novo sequencing of tandem mass spectra.

Precise Characterization of Intact Monoclonal Antibodies by the Agilent 6545XT AdvanceBio LC/Q-TOF

Detecting Challenging Post Translational Modifications (PTMs) using CESI-MS

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

Conjugated protein biotherapeutics

Risk Management User Guide

PESTICIDE SCREENING APPLICATION SOLUTION

ELE4120 Bioinformatics. Tutorial 5

UNIFI: The user environment Ken Eglinton Nordic User Training, September 2013

Advantages of Ion Mobility QTOF for Characterization of BioPharma Molecules

The LC-MS/MS Workhorse. SCIEX 4500 Series Mass Spectrometers

Laboratory Water Quality Affects Protein Separation by 2D Gel Electrophoresis

Statistically Integrated Metabonomic-Proteomic Studies on a Human Prostate Cancer Xenograft Model in Mice

Generating Automated and Efficient LC/MS Peptide Mapping Results with the Biopharmaceutical Platform Solution with UNIFI

Sequence Databases and database scanning

Rapid Peptide Catabolite ID using the SCIEX Routine Biotransform Solution

Protein Bioinformatics Part I: Access to information

INTERNATIONAL STANDARD

MetaboScape 2.0. Innovation with Integrity. Quickly Discover Metabolite Biomarkers and Use Pathway Mapping to Set them in a Biological Context

Advanced QA/QC characterization MS in QC : Multi Attribute Method

Confident Pesticides Analysis with the Agilent LC/Triple Quadrupole and TOF/QTOF Solutions

Rapid Peptide Mapping via Automated Integration of On-line Digestion, Separation and Mass Spectrometry for the Analysis of Therapeutic Proteins

A Unique LC-MS Assay for Host Cell Proteins(HCPs) ) in Biologics

With the globalization of trade, food production

ENCODE DCC Antibody Validation Document

PLRP-S Polymeric Reversed-Phase Column for LC/MS Separation of mabs and ADC

The clearly better choice

Genotoxicity is the property of a compound

Deliverable D2.1. Project Acronym: Grant agreement no.:

Lecture 23: Metabolomics Technology

UC San Diego UC San Diego Electronic Theses and Dissertations

Featuring Analyst software under Windows NT for enhanced performance and ease of use. API 150EX. LC/MS System

Engineering Genetic Circuits

Exam MOL3007 Functional Genomics

Thermo Scientific Solutions for Intact-Protein Analysis. Better, Faster Decisions for. Biotherapeutic Development

Thermo Scientific Proteomics & Metabolomics Seminar Tour

Transcription:

MIAPE: Mass Spectrometry Informatics Pierre-Alain Binz[1,2]*, Robert Barkovich[3], Ronald C. Beavis[4], David Creasy[5], David M. Horn[6], Randall K. Julian Jr.[7], Sean L. Seymour[8], Chris F. Taylor[9], Yves Vandenbrouck[10] [1] Swiss Institute of Bioinformatics, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland [2] GeneBio SA, Av. de Champel 25, Geneva, Switzerland [3] Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA [4] Biomedical Research Centre, University of British Columbia, Vancouver, BC, Canada. V6T 1Z3 [5] Matrix Science Ltd, 64 Baker Street, London W1U 7GB, UK [6] Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA 95051, USA [7] Indigo BioSystems, Inc., Indianapolis, IN, USA [8] Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404, USA [9] European Bioinformatics Institute, Hinxton, UK [10] CEA, DSV, DRDC, laboratoire de Biologie, Informatique et Mathématiques, Grenoble, F-38054 France * Corresponding author Abstract MIAPE - Mass Spectrometry Informatics is one module of the Minimal Information About a Proteomics Experiment (MIAPE) documentation system. MIAPE is developed by the Proteomics Standards Initiative of the Human Proteome Organisation (HUPO-PSI). It aims at delivering a set of guidelines representing the minimal information required to report and sufficiently support assessment and interpretation of a proteomics experiment. This MIAPE Mass Spectrometry Informatics module is the result of a joint effort between the Mass Spectrometry group of HUPO-PSI, the Proteomics Informatics group of HUPO-PSI and the proteomics community. It has been designed to specify a minimal set of information to document a mass spectrometry-based peptide and protein identification and characterization experiment. As for all MIAPE documents, these guidelines will evolve and will be made available on the PSI website at the url http://psidev.info.

MIAPE: Mass Spectrometry Informatics Version 1.1, 5th February, 2008. This module identifies the minimum information required to report the use of protein and peptide identification and characterisation software to analyse the data produced by mass spectrometry experiments, sufficient to support both the effective interpretation and assessment of the data and the potential recreation of the work that generated it. Introduction This document is one of a collection of technologyspecific modules that together constitute the Minimum Information about a Proteomics Experiment (MIAPE) reporting guidelines produced by the Proteomics Standards Initiative. MIAPE is structured around a parent document that lays out the principles to which the individual reporting guidelines adhere. In brief, a MIAPE module represents the minimum information that should be reported about a data set or an experimental process, to allow a reader to interpret and critically evaluate the conclusions reached, and to support their experimental corroboration. In practice a MIAPE module comprises a checklist of information that should be provided (for example about the protocols employed) when a data set is submitted to a public repository or when an experimental step is reported in a scientific publication (for instance in the materials and methods section). The MIAPE modules specify neither the format that information should be transferred in, nor the structure of the repository/text. However, PSI is not developing the MIAPE modules in isolation; several compatible data exchange standards are now well established and supported by public databases and data processing software in proteomics (for details see the PSI website www.psidev.info). The correct analysis of the data produced by mass spectrometry is key to the generation of reliable biological knowledge. Peptide Mass Fingerprints and Peptide Fragment Fingerprints are two types of data that can be used in peptide and protein identification, quantitation, structural characterisation and the investigation of protein modifications. The heterogeneity of sample content and complexity on one hand, and the heterogeneity of mass spectrometers (type, sensitivity, accuracy, efficiency of sample introduction, ionisation, processing) on the other hand have a strong impact on the type, amount and quality of experimental information to be analysed by protein and peptide identification and characterisation software. This is highlighted by the complexity of input parameters required by such software; the results generated are similarly complex in terms of both data structure and pertinence. These guidelines for the reporting of the use of such software do not prescribe that all of that information be captured; and given the diversity of tools currently available, the utility of such detail is clearly open to question. However, it is possible to specify (generic) parameters that are representative of the way in which the software was used, and to contextualise the data generated; enabling a better-informed process of assessment and interpretation. For a full discussion of the principles underlying this specification, please refer to the MIAPE parent document, which can be found on the MIAPE website http://psidev.info. These guidelines cover the use of protein and peptide identification and characterisation software and the data generated. They do not cover the mass spectrometry that generated the data, or the reduction of raw profile data to peak lists; these details are captured in separate MIAPE modules. Note also that these guidelines do not cover all the available features of a protein and peptide identification and characterisation tool (for example, some of the less frequently used parameters, types of spectra or other experimental data). Items falling outside the scope of this module may be captured in complementary modules, which can be obtained from the website. Note that subsequent versions of this document

may have altered scope, as will almost certainly be the case for all the MIAPE modules. The following section, detailing the reporting guidelines for the use of protein and peptide identification and characterisation software, is subdivided as follows: 1. General features; the software employed. 2. Input data and parameters. 3. The output from the procedure; the list of peptides and proteins identified, characterised or quantified. 4. Interpretation and validation. Reporting guidelines for protein and peptide identification and characterisation software 1. General features a) Global descriptors Date stamp (as YYYY-MM-DD) Responsible person (or institutional role if more appropriate); provide name, affiliation and stable contact information Software name, version and manufacturer Customisations made to that software Availability of that software Location of the files generated; parameter files, spectral data (input/output) 2. Input data and parameters a) Input data Description and type of MS data Availability of MS data (source of data, file format) b) Input parameters Databases queried; description and versions (including number of entries searched) Taxonomical restrictions applied Description of tool and scoring scheme Specified cleavage agent(s) Allowed number of missed cleavages Additional parameters related to cleavage Permissible amino acids modifications Precursor-ion and fragment ion mass tolerance for tandem MS (when applicable) Mass tolerance for PMF (when applicable) Thresholds; minimum scores for peptides, proteins (probabilities, number of hits, other metrics) Any other relevant parameters 3. The output from the procedure The procedure might generate all or part of the elements described below (identified proteins, identified peptides, quantization information). Select the elements that apply. a) For identified proteins Accession code in the queried database Protein description Protein scores Validation status Number of different peptide sequences (without considering modifications) assigned to the protein Percent peptide coverage of protein Identity of supporting peptides In the case of PMF, number of matched/unmatched peaks b) For identified peptides Sequence (indicate any deviation from the expected protein cleavage specificity) Peptide scores Chemical modifications (artefactual) and post-translational modifications (naturallyoccurring); sequence polymorphisms with experimental evidence (particularly for isobaric modifications) Corresponding spectrum locus Charge assumed for identification and a measurement of peptide mass error Other additional information, when used for evaluation of confidence c) Quantitation for selected ions Quantitation approach (e.g. 4plex-iTRAQ, ICAT, cicat, COFRADIC) Quantity measurement (e.g. integration of signals, use of signal intensity) Data transformation and normalisation technique (description of method and software) Number of replicates (biological and technical) Acceptance criteria (including measure of errors) Estimates of uncertainty and the methods for the error analysis, including the

treatment of relevant systematic error effects and the treatment of random error issues Results from controls (when described) 4. Interpretation and validation Assessment and confidence given to the identification and quantitation (description of methods, thresholds, values, etc,) Results of statistical analysis or determination of false positive rate in case of large scale experiments Inclusion/exclusion of the output of the software are provided (description of what part of the output has been kept, what part has been rejected) Summary The MIAPE: Mass Spectrometry Informatics minimum reporting guidelines for the use of protein and peptide identification and characterisation software specify that a significant degree of detail be captured, for peptide mass fingerprinting experiments as well as peptide fragment fingerprinting processes. However, it is clear that providing the information required by this document will enable the effective interpretation and assessment of the usage of such software, the qualitative and quantitative assignment of proteins and peptides and potentially, support experimental corroboration. Much of the information required herein may already be stored in an electronic format, or exportable from the instrument; we anticipate further automation of this process. These guidelines will evolve. To contribute, or to track the process to remain MIAPE-compliant, browse to the website at http://psidev.info

Appendix One. The MIAPE-MSI glossary of required items Classification Definition 1. General features (a) Global descriptors Date stamp Responsible person or role Software name, version and manufacturer Customisations Availability of the software Location of the files generated 2. Input data and parameters (a) input data Description and type of MS data Availability of MS data 2.. Input data and parameters (b) input parameters Database queried Taxonomical restrictions Description of tool and scoring scheme Specified cleavage agent(s) Allowed number of missed cleavages The date on which the work described was initiated; given in the standard YYYY-MM-DD format (with hyphens). The (stable) primary contact person for this data set; this could be the experimenter, lab head, line manager etc.. Where responsibility rests with an institutional role (e.g. one of a number of duty officers) rather than a person, give the official name of the role rather than any one person. In all cases give affiliation and stable contact information. For each software used: The trade name of the software used for the identification and/or characterization work, together with the version name or number according to the vendor s nomenclature and vendor name. Any (i.e. affecting behaviour) modification made to the original code and functionality of the software. The references of the vendor or public url if a publicly available version has been used. The location of the data generated. If made available in a public repository, describe the URI (for instance an url, or the url of the repository and the information on how to retrieve the data). If not made available for public access, describe the contact person reference or source and the internal coordinates of the data. Provide a short description that can refer to the data in the experiment (e.g. LC-MS run1). Specify if the submitted data are raw full traces (either as proprietary binary format or exported readable format) or if they are peaklists files. In all cases, specify the original format (i.e. Bruker.yep, Applied Biosystems.wiff, mzdata, mzml, dta (Sequest format), mgf (Mascot generic format) and mzxml (Institute for Systems Biology xml format), other ) Specify the source data. Provide either a URI or the location of the files, or their availability. The description and version of the sequence databank(s) queried, If the databank(s) is/are not available on the web, describe the content of this/these databank(s), including the number of sequences. Specify the amplitude of the subset of the databank(s) (for instance, mammals, a NCBI TaxId, a list of accession numbers). Specify the number of entries searched. Descriptor of the scoring algorithm in the search engine (such as ESI-TRAP in Mascot, ESI ion trap HCTultra in Phenyx, ESI-ION-TRAP in MS-Tag, Ion Trap (4 Da) in X!Tandem, etc) Describe the cleavage agent as available on the search engine. If the cleavage agent rules have been defined by the user, describe the cleavage rules) Allowed maximum number of cleavage sited missed by the specified agent during the in-silico cleavage process.

Additional parameters related to cleavage Permissible amino acids modifications Precursor-ion and fragment-ion mass tolerance for tandem MS (when applicable) Mass tolerance for PMF (when applicable) Thresholds; minimum scores for peptides, proteins Any other relevant parameters This includes, for instance the consideration of semi-specific cleavages (occurring on only one terminus), or other parameter that is relevant to the generation of peptides. Specify the amino acid modifications that have to be considered in the search, according to the search engine list and mode of consideration (fixed or variable for instance). If the user has added custom modification to the predefined lists, specify the modification and the associated rules. For tandem MS queries, specify the mass tolerance applied to precursor ions and to fragment ions submitted to the search engine (when applicable). For PMF and other MS queries, specify the mass tolerance applied to the search engine (when applicable) Describe the parameters associated to the selection of peptide and protein hits retained in the output. These might include, for instance, a minimal score, a maximum p-value, a maximum number of proteins in the output, a minimum number of peptides to match a protein, etc. Any application-specific parameters that are to be set for a search and that have an impact on the interpretation of the result for that experiment. 3. The output from the procedure (a) for identified proteins Protein Accession Code such as a Uniprot-SwissProt AC number (P34521) (not an ID such as YM45_CAEEL), or a ncbi gi number (12803681). In case of the concept of protein group is applied (i.e. a list of protein that share a number of Accession code in the queried database identical peptides among those identified), the description of the protein group might replace the list of individual Accession Codes Protein description Protein description field from the database. Protein scores Validation status Number of different peptide sequences (without considering modifications) assigned to the protein Percent peptide coverage of protein In the case of PMF, number of matched/unmatched peaks Values as reported by the search engine. For all protein hits in the search, specify if accepted without post-processing of search engine/de-novo interpretation (accept raw output of identification software) or if manually accepted as valid or as rejected (false positive). Do consider the number of different peptide sequences. A peptide identified as unmodified and for instance containing an oxidized Methionine is counted as one. A peptide with one missed cleavage and one included peptide with no missed cleavages are considered as two. Expressed as the number of amino acids spanned by the assigned peptides divided by the sequence length. Describe the number of matched m/z values associated to the identified protein, and the number of unmatched signals (or the total number of m/z values in the original spectrum/spectra)

Other additional information, when used for evaluation of confidence This might include retention time, multiplicity of peptide sequence occurrences, flanking residues, etc. 3. The output from the procedure (b) for identified peptides Sequence (notify any deviation from the expected protein cleavage specificity) Peptide scores Chemical modifications (artefactual) and posttranslational modifications (naturallyoccurring); sequence polymorphisms with experimental evidence (particularly for isobaric modifications) Corresponding Spectrum locus Charge assumed for identification and a measurement of peptide mass error Primary sequence of the matched peptides (include number of missed cleavages or if the peptide is issued from semispecific cleavage or fully unspecific cleavage). Scores values and any associated statistical information as available in the output. Occurrence and position of amino acid modifications, being artefactual (such as oxidized Met or Carbamidomethylated Cys), post-translational (such as myristoylated amino acid), issued from an amino acid mutation or a frame shift with respect to the sequence in the database. When a choice among possible isobaric modifications are reported, a justification should be applied (I/L, acetylation/trimethylation Reference to the experimental spectrum. Description of the precursor ion charge state and mass deviation (either expressed as m/z difference or as recalculated mass difference). 3. The output from the procedure (c) quantitation for selected ions Quantitation approach (e.g. 4plex-iTRAQ, ICAT, cicat, COFRADIC) Quantity measurement Data transformation and normalisation technique Number of replicates (biological and technical) Acceptance criteria (including measure of errors) Estimates of uncertainty and the methods for the error analysis, including the treatment of relevant systematic error effects and the treatment of random error issues Describe the experimental protocol used for the quantitation experiment (refer to the sample prep section) Specify the way the quantitation is measured (e.g. integration of signals, use of signal intensity) Describe the method and software: What are the input intensity values, how have they been filtered, transformed, normalised and processed? As part of the protocol. Describe the evaluation method applied to the quantitation software (or manual calculation) result. Describe the acceptance criteria and measures of variability. Include estimates of uncertainty and error tolerances. Include description of treatment of relevant systematic error effects and the treatment of random error issues (such as labelling efficiency, instrument detector saturation, presence of non-unique peptides in MS/MS spectra, interference of distinct peptides with overlapping isotopes or other known or suspected sources of quantification outliers).

Results from controls (when described) In case control experiments have been made, or in case internal controls have been incorporated in the protocol, describe the result of the analysis of these controls. 4. Interpretation and validation Assessment and confidence given to the identification and quantitation (description of methods, thresholds, values, etc.) Results of statistical analysis or determination of false positive rate in case of large scale experiments Inclusion/exclusion of the output of the software are provided (description of what part of the output has been kept, what part has been rejected) Describe the approach the software (when applicable) used for assessing confidence to the identification and quantitation (when applicable). Describe the result of any applied statistics to estimate false positive rate in case of large scale experiments. Provide access to the output data.