Current state of proteomics standardization and (C-)HPP data quality guidelines

Similar documents
to proteomics data in the PRIDE database

ABSTRACT: METRICS OF PROGRESS Table 1.

Protein Grouping, FDR Analysis and Databases.

PTM Identification and Localization from MS Proteomics Data

High-throughput Proteomic Data Analysis. Suh-Yuen Liang ( 梁素雲 ) NRPGM Core Facilities for Proteomics and Glycomics Academia Sinica Dec.

PRIDE and ProteomeXchange

ProteinPilot Report for ProteinPilot Software

Mass Spectrometry Based Proteomics Data Analysis Using GalaxyP

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP)

How to view Results with. Proteomics Shared Resource

ProteinPilot Software for Protein Identification and Expression Analysis

How to view Results with Scaffold. Proteomics Shared Resource

Proteomics software at MSI. Pratik Jagtap Minnesota Supercomputing institute

Original article PRIDE: Quality control in a proteomics data repository

Spectral Counting Approaches and PEAKS

Mass spectrometry Proteomics and MIAPE

PRIDE Inspector: a tool to visualize and validate MS proteomics data

GNPS: Global Natural Products Social Molecular Networking Delivering data-enabled, community-driven research

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

MIAPE: Mass Spectrometry Informatics

False Discovery Rate ProteoRed Multicentre Study 6 Terminology Decoy sequence construction

ProteinPilot Software Overview

Important Information for MCP Authors

Pushing the Leading Edge in Protein Quantitation: Integrated, Precise, and Reproducible Protein Quantitation Workflow Solutions

Chromosomeomosome 13 Chromosomeomosome 17 Gene a AST b nssnps c Gene AST nssnps BRCA BRCA RB1 2 3 ERBB IRS2 1 3 TP

Accepted Article. Faculty of Science, University of Zurich, CH-8049 Zurich, Switzerland

Center for Mass Spectrometry and Proteomics Phone (612) (612)

Bioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?

PROTEOINFORMATICS OVERVIEW

Enabling Systems Biology Driven Proteome Wide Quantitation of Mycobacterium Tuberculosis

Supporting Information for Comprehensive HCP Profiling by Targeted and Untargeted Analysis of DIA Mass Spectrometry Data with PRM Verification

A Bovine PeptideAtlas of milk and mammary gland proteomes

MIAPE: Mass Spectrometry Quantification

RockerBox. Filtering massive Mascot search results at the.dat level

Spectronaut Pulsar X. Maximize proteome coverage and data completeness by utilizing the power of Hybrid Libraries

Quantitative Proteomics: From Technology to Cancer Biology

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis. The Open2Dprot Project. Introduction

CCRD Proteomics facility RIH-Brown University

About OMICS Group Conferences

Workflows and Pipelines for NGS analysis: Lessons from proteomics

MCP Papers in Press. Published on August 12, 2015 as Manuscript O

MCP Papers in Press. Published on April 29, 2011 as Manuscript O The Human Proteome Project: Current State and Future Direction

HIGHLIGHTS FROM HPP WORKSHOP 21 SEPTEMBER 2017 in DUBLIN

Chromosome 5. Kyoto HUPO Initiative Assembly. Peter Horvatovich 1, Karin Wolters 1, Pei-Jing Pai 2 Yingwei Hu 2, Henry Lam 2, Rainer Bischoff 1

Filter-based Protein Digestion (FPD): A Detergent-free and Scaffold-based Strategy for TMT workflows

Institute for Advanced Studies, City University of Hong Kong Workshop on Genomics, Cells, & Mathematics 10 July 2018

IPRG 2015 (PROTEOME INFORMATICS RESEARCH GROUP) DIFFERENTIAL ABUNDANCE IN LABEL-FREE PROTEOMICS. Olga Vitek

Targeted Proteomics Environment

New Approaches to Quantitative Proteomics Analysis

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

Strategies for Quantitative Proteomics. Atelier "Protéomique Quantitative" La Grande Motte, France - June 26, 2007

Proteomics: A Challenge for Technology and Information Science. What is proteomics?

Improving Productivity with Applied Biosystems GPS Explorer

SpectroDive NEXT GENERATION TARGETED PROTEOMICS. Integration of Ready-Made Panels Improved Workflow for Custom Panels

Modification Site Localization Scoring Integrated into a Search Engine

Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines

PEAKS 8 User Manual. PEAKS Team

The twenty minute guide to mztab

Supplementary Information

Peptide and protein identification in mass spectrometry based proteomics. Yafeng Zhu, PhD student Karolinska Institutet, Scilifelab

PeptideShaker enables reanalysis of mass spectrometryderived. proteomics datasets

N- The rank of the specified protein relative to all other proteins in the list of detected proteins.

A High-Confidence Human Plasma Proteome Reference Set with Estimated Concentrations in PeptideAtlas

A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation

Targeted Proteomics Environment

A toolkit for the mzidentml standard: the ProteoIDViewer, the mzidlibrary

Faster, More Sensitive Peptide ID by Sequence DB Compression. Nathan Edwards Center for Bioinformatics and Computational Biology

Public sharing of complex MS- based qualita:ve and quan:ta:ve proteomic data analysis workflows: adding value to big data repositories

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

TEDDY Omics Data Availability

The Human Proteome Project: Current State and Future Direction

Integrative analysis frameworks for improved peptide and protein identifications from tandem mass spectrometry data

Metabolomics: Techniques and Applications ABRF

Agilent s NEW MassHunter Profinder

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

Dr. Robert L. Moritz Director Proteomics Research Insitutute for Systems Biology

Towards unbiased biomarker discovery

Clustering and scoring molecular interactions

Center for Mass Spectrometry and Proteomics Phone (612) (612)

File Formats Commonly Used in Mass Spectrometry Proteomics*

Confident Protein ID using Spectrum Mill Software

Nature Biotechnology: doi: /nbt Supplementary Figure 1. The workflow of Open-pFind.

Combination of Isobaric Tagging Reagents and Cysteinyl Peptide Enrichment for In-Depth Quantification

Mock Submissions to FDA/CDRH: History and Lessons Learned

Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine

Research Powered by Agilent s GeneSpring

Mass Spectrometry at EuPathDB

Next Generation Technology for Reproducible and Precise Proteome Profiling

Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs

UNIFI: The user environment Ken Eglinton Nordic User Training, September 2013

Introduction. Benefits of the SWATH Acquisition Workflow for Metabolomics Applications

timstof Pro powered by PASEF and the Evosep One for high speed and sensitive shotgun proteomics

Proteomics Background and clinical utility

PRIDE Inspector: a tool to visualize and validate MS proteomics data

Proteogenomics. Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Agilent Solutions for Metabolomics YOUR PATH TO SUCCESS

ENCODE DCC Antibody Validation Document

Assay Validation Services

Transcription:

9/6/2016 3 Department of Pharmacy Analytical Biochemistry Current state of proteomics standardization and (C-)HPP data quality guidelines DTL focus meeting on data integration, standards and fair principles in proteomics Péter Horvatovich

9/6/2016 4 Organization of the Human Proteome Project

9/6/2016 5 Organization of C-HPP I. http://c-hpp.webhosting.rug.nl/tiki-index.php

Organization of C-HPP II. Biobanks Integration of C-HPP and B/D-HPP Teams Slide from Mark Baker

9/6/2016 7 First guideline of C-HPP Paik YK, et al., Standard Guidelines for the Chromosome-Centric Human Proteome Project, PMID 22443261.

The key to making real headway on the HPP is to agree on a common, shared, globally acceptable big data language Slide from Mark Baker

The Human Proteome Project Workflow HPP Publications Individual lab-based MS data ProteomeXchange PRIDE HPP Guidelines nextprot PE1-5 classifications PE1 = PE2 = PE3 = PE4 = PE5 = nextprot MassIVE PeptideAtlas PASSEL GPMdb HPP Metrics Human Protein Atlas Slide from Mark Baker

Slide from Lydie Lane

HPP/neXtProt protein existence data from 2013-2016 PE Level PE1 Evidence at Protein Level PE2 Evidence at Transcript Level only PE3 Inferred from Homology PE4 Predicted PE5 Uncertain NeXtProt 18/09/2013 version % NeXtProt 12/02/2016 version 15,649 77.7 16,518 82.4 3,576 17.7 2290 11.4 198 1.0 565 2.8 % 94 0.5 94 0.5 635 3.2 588 2.9 TOTAL 20,152 100 20,055 100 the missing proteins Slide from Mark Baker

Metrics Used by HPP Teams Initial 2013 definition of missing was no protein level data or insufficient documentation for ID (PE2+PE3+PE4+PE5) In 2014, revised to PE2+PE3+PE4 as PE5 proteins considered dubious Slide from Mark Baker

A new protein existence viewer https://search.nextprot.org/view/statistics/protein-existence Slide from Lydie Lane

9/6/2016 14 Nature papers on the draft of Human proteome PMID 24870542 84% PMID 24870543 92%

Testing 2014 Claims of Credible MS evidence for 108/200 ORs 1.Failure to use discriminating (proteotypic) from nondiscriminating peptides 2.Inclusion of many low-quality MS spectra 3.Use of short peptides (< 7aa containing peptides) 4.Use of older d base builds Slide from Mark Baker

Human peptides in PeptideAtlas 2014-08 133 million PSMs 1 million distinct peptides 14,000 canonical proteins Proteins 100% 0.00009 PSM FDR 0.0002 Peptide FDR 0.01 Protein FDR Only peptides 7 AA 70% 75% 50% 25% 0% Slide from Eric Deutsch 16

Olfactory receptor evidences in PeptideAtlas Slide from Eric Deutsch 17

Olfactory receptors in PeptideAtlas Only 2 of nextprot s 473 olfactory receptors are canonical in PeptideAtlas 18 Slide from Eric Deutsch

Which protein does the peptide implicate? Spectrum originally identified to: GYIVAAVVK But a better and exact match is: GYIAVAVVK But this latter sequence is not in our reference proteome. Which is why it was not identified correctly. Is it olfactory receptor OR5A2? (no other corroborating evidence) GIVSVLVVLISYGYIVAAVVKISSATGRTKAFSTCASH GYIAVAVVK Or is it serotransferrin (0.5 million PSMs) SDNCEDTPEAGYFAIAVVKKSASDLTWDNLKGKKS GYIAVAVVK I V dbsnp:rs2692696 is in our reference proteome from UniProt F I not in our reference proteome. Not in nextprot. But this protein has many SNPs, and this may be the explanation Slide from Eric Deutsch 19

Q9H255 = OR51E2 But GPMdb does have this one. This is the only OR that Ron Beavis thinks is legitimate. But only observed with a single peptide (many times) (in one sample that PeptideAtlas doesn t have) Ron Beavis: If you check a little closer, the older gene symbol for OR51E2 is PSGR, a prostate-specific G- coupled receptor protein (Cancer Res. 2000 Dec 1;60(23):6568-72). So, I'd actually suggest that this is a true identification and that interpreting the "OR" in the gene name as being literally true is the problem. Slide from Eric Deutsch 20

Growth of Human Proteome with Large Datasets from 2014-2015 Note Savitski/Kuster reanalysis of Wilhelm et al: 14,741 proteins identified, MCP 2015 Slide from Gilbert S. Omenn

Latest HPP Guideline HUPO: MIAPE PSI NIH-NCI: proteogenomics guideline Journals: - Journal of Proteome Research - Molecular and Cellular Proteomics - Proteomics Clinical Applications HPP 1.0: data deposition at ProteomeXchange, FDR at PSM, peptide and proteins levels HPP 2.0: MS data interpretation PMID 27490519

Manuscript detailing the process Example dataset: PXD000764 - Title: Discovery of new CSF biomarkers for meningitis in children - 12 runs: 4 controls and 8 infected samples - Identification and quantification data http://www.proteomexchange.org/submission Ternent et al., Proteomics, 2014 Juan A. Vizcaíno juan@ebi.ac.uk 13 th HUPO World Congress Madrid, 5 October 2014

PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzml, mzxml). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzidentml data standard. Published Raw Files Other files Juan A. Vizcaíno juan@ebi.ac.uk b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type 13 th HUPO World Congress Madrid, 5 October 2014

Complete vs Partial submissions: experimental metadata Complete Partial General experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed Juan A. Vizcaíno juan@ebi.ac.uk 13 th HUPO World Congress Madrid, 5 October 2014

Complete vs Partial submissions: processed results For complete submissions, it is possible to connect the spectra with the identification processed results and they can be visualized. Complete Partial Juan A. Vizcaíno juan@ebi.ac.uk 13 th HUPO World Congress Madrid, 5 October 2014

Complete submissions using mzidentml Search Engine Results + MS files Search engines mzidentml An increasing number of tools support export to mzidentml 1.1 - Mascot - MSGF+ - Myrimatch and related tools from D. Tabb s lab - OpenMS - PEAKS - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idconvert tool (ProteoWizard) - ProteinPilot (planned by the end of 2014) - Others: library for X!Tandem conversion, lab internal pipelines, - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementingmzidentml#. Juan A. Vizcaíno juan@ebi.ac.uk 13 th HUPO World Congress Madrid, 5 October 2014

Now: native file export Tools RESULT file generation Final RESULT file Mascot ProteinPilot Scaffold PEAKS Native File export mzidentml RESULT MSGF+ Others Spectra files Juan A. Vizcaíno juan@ebi.ac.uk 13 th HUPO World Congress Madrid, 5 October 2014

FDR accumulation when combining datasets

Manual Inspection of Extraordinary Claims Reviewers and readers (and authors) need to see this: Slide from Eric Deutsch

Manual Inspection of Extraordinary Claims Reviewers and readers should not see this: This is what false positives look like Slide from Eric Deutsch

Thank you for you attention! Acknowledgement of all collaborators and members of (C)-HPP participating on C-HPP workshops and HUPO meetings Questions!