Proteomics: A Challenge for Technology and Information Science. What is proteomics?

Similar documents
Proteomics and some of its Mass Spectrometric Applications

Proteomics. Proteomics is the study of all proteins within organism. Challenges

Computing with large data sets

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

Strategies for Quantitative Proteomics. Atelier "Protéomique Quantitative" La Grande Motte, France - June 26, 2007

SGN-6106 Computational Systems Biology I

Basic protein and peptide science for proteomics. Henrik Johansson

Advances in analytical biochemistry and systems biology: Proteomics

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Quantitative mass spec based proteomics

Outline and learning objectives. From Proteomics to Systems Biology. Integration of omics - information

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

From Proteomics to Systems Biology. Integration of omics - information

Spectrum Mill MS Proteomics Workbench. Comprehensive tools for MS proteomics

Proteomics and Cancer

Towards unbiased biomarker discovery

timstof Innovation with Integrity Powered by PASEF TIMS-QTOF MS

Improving Productivity with Applied Biosystems GPS Explorer

Proteomics And Cancer Biomarker Discovery. Dr. Zahid Khan Institute of chemical Sciences (ICS) University of Peshawar. Overview. Cancer.

What are proteomics? And what can they tell us about seed maturation and germination?

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

LTQ Orbitrap XL Hybrid FT Mass Spectrometer Unrivaled Performance and Flexibility

New Approaches to Quantitative Proteomics Analysis

Innovations for Protein Research. Protein Research. Powerful workflows built on solid science

ProteinPilot Report for ProteinPilot Software

Ensure your Success with Agilent s Biopharma Workflows

Systems Biology and Systems Medicine

Confident Protein ID using Spectrum Mill Software

Medicinal Chemistry of Modern Antibiotics

Medicinal Chemistry of Modern Antibiotics

A New Strategy for Quantitative Proteomics Using Isotope-Coded Protein Labels

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis. The Open2Dprot Project. Introduction

Advances in Experimental Medicine and Biology

Protein Valida-on (Sta-s-cal Inference) and Protein Quan-fica-on. Center for Mass Spectrometry and Proteomics Phone (612) (612)

timstof Pro powered by PASEF and the Evosep One for high speed and sensitive shotgun proteomics

MBios 478: Mass Spectrometry Applications [Dr. Wyrick] Slide #1. Lecture 25: Mass Spectrometry Applications

Application Note TOF/MS

Tooling up for Functional Genomics

ProteinPilot Software Overview

QSTAR XL Hybrid LC/MS/MS System. Extended Performance Plus Exceptional Flexibility. QSTAR XL. Hybrid LC/MS/MS System

rapiflex Innovation with Integrity Designed for Molecules that Matter. MALDI TOF/TOF

A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry* S

Expand Your Research with Metabolomics and Proteomics. Christine Miller Omics Market Manager ASMS 2017

ProteinPilot Software for Protein Identification and Expression Analysis

Combination of Isobaric Tagging Reagents and Cysteinyl Peptide Enrichment for In-Depth Quantification

Application Note # ET-20 BioPharma Compass: A fully Automated Solution for Characterization and QC of Intact and Digested Proteins

ProMass HR Applications!

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP)

Overview. Tools for Protein Sample Preparation, 2-D Electrophoresis, and Imaging and Analysis

How to view Results with Scaffold. Proteomics Shared Resource

Quantitative Analysis on the Public Protein Prospector Web Site. Introduction

Proteomics software at MSI. Pratik Jagtap Minnesota Supercomputing institute

27041, Week 02. Review of Week 01

A Highly Accurate Mass Profiling Approach to Protein Biomarker Discovery Using HPLC-Chip/ MS-Enabled ESI-TOF MS

基于质谱的蛋白质药物定性定量分析技术及应用

Informatics on MS-Based Proteomics

Strategies in proteomics

Spectronaut Pulsar X. Maximize proteome coverage and data completeness by utilizing the power of Hybrid Libraries

Center for Mass Spectrometry and Proteomics Phone (612) (612)

Appendix. Table of contents

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Liver Mitochondria Proteomics Employing High-Resolution MS Technology

Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs

MetaboScape 2.0. Innovation with Integrity. Quickly Discover Metabolite Biomarkers and Use Pathway Mapping to Set them in a Biological Context

Detecting Challenging Post Translational Modifications (PTMs) using CESI-MS

Automation of MALDI-TOF Analysis for Proteomics

Capabilities & Services

Faster, easier, flexible proteomics solutions

Využití cílené proteomiky pro kontrolu falšování potravin: identifikace peptidových markerů v mase pomocí LC- Q Exactive MS/MS

Exam MOL3007 Functional Genomics

Supplementary Figure 1. Utilization of publicly available antibodies in different applications.

Spectral Counting Approaches and PEAKS

Biology 644: Bioinformatics

AB SCIEX TripleTOF 5600 SYSTEM. High-resolution quant and qual

Protein Grouping, FDR Analysis and Databases.

Thermo Scientific Q Exactive HF Orbitrap LC-MS/MS System. Higher-Quality Data, Faster Than Ever. Speed Productivity Confidence

Filter-based Protein Digestion (FPD): A Detergent-free and Scaffold-based Strategy for TMT workflows

The Agilent Metabolomics Dynamic MRM Database and Method

Supplementary Figure 1. Processing and quality control for recombinant proteins.

Proteomics and Mass Spectrometry

Introduction to Proteomics

N- The rank of the specified protein relative to all other proteins in the list of detected proteins.

timstof Pro with PASEF and Evosep One: Maximizing throughput, robustness and analytical depth for shotgun proteomics

Public sharing of complex MS- based qualita:ve and quan:ta:ve proteomic data analysis workflows: adding value to big data repositories

INTEGRATED DATA MODELING IN HIGH-THROUGHPUT PROTEOMICES

NPTEL

Enabling Systems Biology Driven Proteome Wide Quantitation of Mycobacterium Tuberculosis

Xevo G2-S QTof and TransOmics: A Multi-Omics System for the Differential LC/MS Analysis of Proteins, Metabolites, and Lipids

ENHANCED PROTEIN AND PEPTIDE CHARACTERIZATION

LECTURE-3. Protein Chemistry to proteomics HANDOUT. Proteins are the most dynamic and versatile macromolecules in a living cell, which

LC/MS/MS Solutions for Biomarker Discovery QSTAR. Elite Hybrid LC/MS/MS System. More performance, more reliability, more answers

Bioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?

RockerBox. Filtering massive Mascot search results at the.dat level

Mass Spectrometry-based Methods of Proteome Analysis

perspective Proteomics: a pragmatic perspective Parag Mallick 1,2 & Bernhard Kuster 3, Nature America, Inc. All rights reserved.

How to view Results with. Proteomics Shared Resource

Proteomics Technology Note

Accelerating Throughput for Targeted Quantitation of Proteins/Peptides in Biological Samples

Transcription:

Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics tgriffin@umn.edu What is proteomics? Proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and, ultimately, their function. -Stan Fields in Science, 2001. 1

Genomics vs. Proteomics Similarities: Large datasets, tools needed for annotation and interpretation of results Differences: Genomics generally mature technologies, data processing methods, questions asked usually involve quantitative changes in RNA transcripts (microarrays) Proteomics still evolving, complexity of protein biochemical properties: expression changes, modifications, interactions, activities many questions to ask and data to interpret, methods changing, different approaches (mass spec, arrays etc.), Genomics, Proteomics, and Systems Biology genomics proteomics computational biology genomic DNA mrna protein products functional protein system mature prototype emerging sequencing arrays 3D structure quantitative profiling protein cataloguing catalytic activity sub cellular location Protein Modifications Protein dynamics protein phosphorylation descriptive protein interaction maps interactions between components identify system components measure and define properties 2

Shotgun identification of proteins in mixtures by LC-MS/MS Liquid chromatography coupled to tandem mass spectrometry (MS/MS) Protein(s) peptides peptide fragments Digestion µlc separation (50-100 um) Ionization: MALDI or Electrospray Isolation Fragmentation Mass Analysis Tandem mass spectrum (thousands in a matter of hours) Peptide sequence determination from MS/MS spectra Collision-induced dissociation (CID) creates two prominent ion series: y-series: y 14 y 13 y 12 y 11 y 10 y 9 y 8 y 7 y 6 y 5 y 4 y 3 y 2 y 1 2 N-N--S--G--D--I--V--N--L--G--S--I--A--G--R-COO b-series: b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 b 10 b 11 b 12 b 13 b 14 Relative Abundance 200 400 600 800 1000 1200 3

Peptide sequence identifies the protein Relative Abundance GDIVNLGSIAGR DIVNLGSIAGR IVNLGSIAGR VNLGSIAGR NLGSIAGR LGSIAGR GSIAGR SIAGR IAGR AGR GR R 2 N-NSGDIVNLGSIAGR-COOCOO 200 400 600 800 1000 1200 YMR134W, yeast protein involved in iron metabolism igh-throughput protein identification by LC-MS/MS and automated sequence database searching Raw MS/MS spectrum Relative Abundance 200 400 600 800 1000 1200 Protein sequence and/or DNA sequence database search Direct identification of 1000 proteins from complex mixtures Peptide sequence match Relative Abundance GDIVNLGSIAGR DIVNLGSIAGR IVNLGSIAGR VNLGSIAGR NLGSIAGR LGSIAGR GSIAGR SIAGR IAGR AGR GR R 2 N-NSGDIVNLGSIAGR-COOCOO 200 400 600 800 1000 1200 Protein identification 4

Dealing with the data 1. Data acquisition Experimental information, metadata capture Integrated workflow? 2. Peak analysis 3. Knowledge annotation and interpretation Sequence database searching Quantitative analysis Database mining Assignment of function, pathway, localization etc. Output for database archiving, publication 1. Data acquisition: capturing experimental information Proteomics Experimental Data Repository (PEDRo) Proposed schema Similar to genomic needs, but experimental info a bit different 5

ProFound Mascot PepSea MS-Fit MOWSE Peptident Multident Sequest PepFrag MS-Tag 2. Peak Analysis Computational algorithms for searching MS/MS spectra against protein sequence databases, mrna sequences, DNA sequences Relative Abundance 200 400 600 800 1000 1200 Protein identification need cpu horsepower (parallel computing) 2. Peak Analysis: data formats Format 1 Format 2 Format 3?? Output 1 Output 2 Output 3 Lack of flexibility Slow to evolve Lack of incorporation of competing products, methods 6

2. Peak Analysis: need general, flexible, in-house solutions Format 1 Format 2 Format 3 reverse engineering of data formats General tools for analysis of multiple data formats 2. Peak Analysis; reverse engineering data formats http://sashimi.sourceforge.net/software_glossolalia.html 7

2. Peak analysis: quality control of protein matches filtering Unfiltered 10 5 matches (lots of noise and junk) Filtered thousands of true matches Statistical analysis of database results (tools are available) 2. Peak Analysis: Quantitative analysis State 1 State 2 N = normal isotope label N combine, proteolyze and isolate labeled peptides = heavy isotopic label (e.g. 2, 13 C, 15 N) N analyze peptides by mass spectrometry External chemical labeling Metabolic labeling (SILAC) Enzymatic incorporation (O 16 /O 18 ) intensity N Δm relative protein abundance = [intensity of N-labeled peptide] [intensity of -labeled peptide] mass-to-charge () Flexibility is key need tools to handle different quantitative methods 8

2. Peak Analysis: Quantitative analysis T O F M S : 2 0 M C A sca n s fro m m m _ sa m p le.w iff a=3.56145059693694800e-004, t0=6.89652636903192620e001 274 260 240 220 200 180 160 140 Relative intensity = relative protein abundance Sample 2 1926.0240 1927.0231 1928.0203 Max. 274.0 counts. 120 100 80 Sample 1 1917.9946 1916.9909 1929.0322 60 40 20 1918.9924 1920.0007 1921.0165 1924.9803 1930.0176 1931.0077 0 1914 1916 1918 1920 1922 1924 1926 1928 1930 1932 1934, amu Evolving methodologies: itraq Sample: 1 2 3 4 Digest to peptides Digest to peptides Digest to peptides itraq label: 114 115 116 117 Digest to peptides Multidimensional separation MS/MS spectrum Intensity 1 2 3 4 114 115 116 117 Diagnostic ions used for quantitative analysis Peptide fragments used for sequence identification 4-way multiplexing: simultaneous comparison of multiple states, replicates 9

old Need for changeable tools new 3 116.0972 TOF MS: 20 MCA scans from mm_sample.wiff a=3.56145059693694800e-004, t0=6.89652636903192620e001 274 260 240 220 200 180 160 140 120 100 80 60 40 20 0 Sample 1 1916.9909 1917.9946 Relative intensity = relative protein abundance 1918.9924 1920.0007 1921.0165 Sample 2 1924.9803 1926.0240 1927.0231 1928.0203 1929.0322 1930.0176 1931.0077 Max. 274.0 counts. 1914 1916 1918 1920 1922 1924 1926 1928 1930 1932 1934, amu Intensity 1 115.0963 117.1025 114.1005 2 4 Automated analysis tools? 3. Knowledge annotation: making sense of lists of data 10

3. Knowledge annotation: mining proteomic/genomic databases 3. Knowledge annotation: needs Annotation: accession numbers and protein names Functional assignments (functional degeneracy?) Pathway assignments Subcellular localization Disease implications Comparison of different proteomic datasets (i.e. expression profiles compared to modification state profiles, other protein properties) Automated and streamlined?? Publication and deposit in databases Visualization of complex phenomena, interpretation of biological relevance Modeling, integration with genomics data computational and systems biology 11