Center for Mass Spectrometry and Proteomics Phone (612) (612)

Similar documents
PTM Identification and Localization from MS Proteomics Data

N- The rank of the specified protein relative to all other proteins in the list of detected proteins.

Spectral Counting Approaches and PEAKS

ProteinPilot Software Overview

Application Note TOF/MS

Utilizing novel technology for the analysis of therapeutic antibodies and host cell protein contamination

Improving Productivity with Applied Biosystems GPS Explorer

Proteomics and some of its Mass Spectrometric Applications

De novo sequencing in the identification of mass data. Wang Quanhui Liu Siqi Beijing Institute of Genomics, CAS

ProteinPilot Report for ProteinPilot Software

FACTORS THAT AFFECT PROTEIN IDENTIFICATION BY MASS SPECTROMETRY HAOFEI TIFFANY WANG. (Under the Direction of Ron Orlando) ABSTRACT

Peptide and protein identification in mass spectrometry based proteomics. Yafeng Zhu, PhD student Karolinska Institutet, Scilifelab

Supplemental Materials

Automation of MALDI-TOF Analysis for Proteomics

Spectrum Mill MS Proteomics Workbench. Comprehensive tools for MS proteomics

About OMICS Group Conferences

Proteomics software at MSI. Pratik Jagtap Minnesota Supercomputing institute

Protein Valida-on (Sta-s-cal Inference) and Protein Quan-fica-on. Center for Mass Spectrometry and Proteomics Phone (612) (612)

ProteinPilot Software for Protein Identification and Expression Analysis

1638 Molecular & Cellular Proteomics 6.9

Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs

Practical Tips. : Practical Tips Matrix Science

Bioinformatics for proteomics

Advances in analytical biochemistry and systems biology: Proteomics

基于质谱的蛋白质药物定性定量分析技术及应用

Algorithm for Matching Additional Spectra

MIAPE: Mass Spectrometry Informatics

Quantitative mass spec based proteomics

New Approaches to Quantitative Proteomics Analysis

A New Strategy for Quantitative Proteomics Using Isotope-Coded Protein Labels

Nature Biotechnology: doi: /nbt Supplementary Figure 1. The workflow of Open-pFind.

Proteomics: A Challenge for Technology and Information Science. What is proteomics?

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

RockerBox. Filtering massive Mascot search results at the.dat level

Proteomics And Cancer Biomarker Discovery. Dr. Zahid Khan Institute of chemical Sciences (ICS) University of Peshawar. Overview. Cancer.

Využití cílené proteomiky pro kontrolu falšování potravin: identifikace peptidových markerů v mase pomocí LC- Q Exactive MS/MS

Proteomics Informatics (BMSC-GA 4437)

How to view Results with Scaffold. Proteomics Shared Resource

Protein Grouping, FDR Analysis and Databases.

Quantification of Isotope Encoded Proteins in 2D Gels

Mass Spectrometry Based Proteomics Data Analysis Using GalaxyP

LTQ Orbitrap XL Hybrid FT Mass Spectrometer Unrivaled Performance and Flexibility

Innovations for Protein Research. Protein Research. Powerful workflows built on solid science

Ariadne tutorial 1: RNA identification

ENHANCED PROTEIN AND PEPTIDE CHARACTERIZATION

d Yield of peptide (µm)

Ensure your Success with Agilent s Biopharma Workflows

An Improved Approach for Glycan Structure Identification from HCD MS/MS Spectra

Proteomics. Proteomics is the study of all proteins within organism. Challenges

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

Supporting Information for

Put the PRO in Protein Characterization

BIOINFORMATICS ORIGINAL PAPER

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP)

Supporting Online Material for

ProMass HR Applications!

Bioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?

Informatics on MS-Based Proteomics

How to view Results with. Proteomics Shared Resource

Really high sensitivity mass spectrometry and Discovery and analysis of protein complexes

High-throughput Proteomic Data Analysis. Suh-Yuen Liang ( 梁素雲 ) NRPGM Core Facilities for Proteomics and Glycomics Academia Sinica Dec.

Faster, More Sensitive Peptide ID by Sequence DB Compression. Nathan Edwards Center for Bioinformatics and Computational Biology

Web based Bioinformatics Applications in Proteomics. Genbank

Supplementary Tables. Note: Open-pFind is embedded as the default open search workflow of the pfind tool. Nature Biotechnology: doi: /nbt.

Basic protein and peptide science for proteomics. Henrik Johansson

Workflows and Pipelines for NGS analysis: Lessons from proteomics

Liver Mitochondria Proteomics Employing High-Resolution MS Technology

Proteomics and Cancer

Database Searching for Protein Identification and Characterization

Application Note # ET-20 BioPharma Compass: A fully Automated Solution for Characterization and QC of Intact and Digested Proteins

Isotopic Resolution of Chromatographically Separated IdeS Subunits Using the X500B QTOF System

Strategies for Quantitative Proteomics. Atelier "Protéomique Quantitative" La Grande Motte, France - June 26, 2007

Mass Spectrometry and Proteomics - Lecture 6 - Matthias Trost Newcastle University

Representing Errors and Uncertainty in Plasma Proteomics

Confident Protein ID using Spectrum Mill Software

Shotgun Proteomics: How Confident are you in that Identification? or Statistical Evaluation of Shotgun Proteomic Data

Sequence Databases and database scanning

ichemistry SOLUTIONS Integrated chemistries to boost mass spectrometry workflows

Optimization-Based Peptide Mass Fingerprinting for Protein Mixture Identification

MBios 478: Mass Spectrometry Applications [Dr. Wyrick] Slide #1. Lecture 25: Mass Spectrometry Applications

Comparability Analysis of Protein Therapeutics by Bottom-Up LC-MS with Stable Isotope-Tagged Reference Standards

Cell Signaling Technology

Appendix. Table of contents

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

Exam MOL3007 Functional Genomics

Combination of Isobaric Tagging Reagents and Cysteinyl Peptide Enrichment for In-Depth Quantification

Mass Spectrometry in Proteomics

Filter-based Protein Digestion (FPD): A Detergent-free and Scaffold-based Strategy for TMT workflows

with Database Search of Tandem Mass Spectra

Biotherapeutic Non-Reduced Peptide Mapping

Supplementary information, Figure S1A ShHTL7 interacted with MAX2 but not another F-box protein COI1.

with Database Search of Tandem Mass Spectra

Hongwei Xie, Martin Gilar, and John C. Gebler Waters Corporation, Milford, MA, U.S.A. INTRODUCTION EXPERIMENTAL

Instrumental Solutions for Metabolomics

Nature Biotechnology: doi: /nbt Supplementary Figure 1

Protein De novo Sequencing

LC/MS/MS Solutions for Biomarker Discovery QSTAR. Elite Hybrid LC/MS/MS System. More performance, more reliability, more answers

The best proteomics: combination of upfront reduction of sample complexity, reproducible resolution, and effective follow up

Protein Bioinformatics Part I: Access to information

Supplementary Information

Transcription:

Outline Database search types Peptide Mass Fingerprint (PMF) Precursor mass-based Sequence tag Results comparison across programs Manual inspection of results Terminology Mass tolerance MS/MS search FASTA Tandem MS Precursor mass Product ion Theoretical mass Experimental mass Sequence tag de novo sequencing

Three Strategies for Protein Inference from Peptide MS or MS/MS Data DATA INPUT: Peptide Mass List (MS1) Peptide MS/MS (MS2) Search TYPE: Peptide Mass Fingerprint (PMF) (historical) Precursor Mass-based 2 Sequencebased (de novo or mass tag) 1 3

1 Peptide Mass Fingerprint (PMF) Historical: after 1D or 2D in-gel Digestion of single bands or spots Current applications: limited; decreasing utility Component Input (Data) Target Search Parameters Description Peptide Mass List Mass type (Mr or MH+) Average or Monoisotopic Protein FASTA Sequence Database Proteolytic enzyme # Missed enzyme cleave sites Amino acid modifications Mass tolerance

Absolute Intensity 1 Peptide Mass Fingerprint: INPUT UNKNOWN PROTEIN (amino acid sequence): GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTG HPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLT ALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLE FISDAIIHVLHSHPGNFGADAQGAMTKALELFRND IAAKYKELGFQG B) Acquire MALDI-TOF Mass Spectrum of peptide mixture A) In gel Trypsin digestion GLSDGEWQQVLNVWGK VEADIAGHGQEVLIR LFTGHPETLEK TEAEMK ASEDLK HGTVVLTALGGILK GHHEAELKPLAQSHATK YLEFISDAIIHVLHSHPGNFGADAQGAMTK ALELFR NDIAAK ELGFQG m/z C) Experimental Peptide Mass (or m/z) List m/z Peptide 631.71 NDIAAK 650.71 ELGFQG 662.72 ASEDLK 708.81 TEAEMK 748.90 ALELFR 1272.44 LFTGHPETLEK 1379.69 HGTVVLTALGGILK 1607.81 VEADIAGHGQEVLIR 1817.01 GLSDGEWQQVLNVWGK 1855.06 GHHEAELKPLAQSHATK 3241.65 YLEFISDAIIHVLHSHPGNFGADAQGAMTK

1 PMF TARGET: Protein FASTA Sequence Database >sp P31946 1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3 MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN >sp P31946-2 1433B_HUMAN Isoform Short of 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB MDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWR VISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLK MKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYE ILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAG EGEN >sp P62258 1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE EQNKEALQDVEDENQ >sp Q04917 1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH PE=1 SV=4 MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSW RVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESK VFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFS VFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDE EAGEGN >sp P61981 1433G_HUMAN 14-3-3 protein gamma OS=Homo sapiens GN=YWHAG PE=1 SV=2 MVDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSNEERNLLSVAYKNVVGARRSSW RVISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQDVLSLLDNYLIKNCSETQYESK VFYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHEISKEHMQPTHPIRLGLALNYS VFYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDD DGGEGNN >sp P31947 1433S_HUMAN 14-3-3 protein sigma OS=Homo sapiens GN=SFN PE=1 SV=1 MERASLIQKAKLAEQAERYEDMAAFMKGAVEKGEELSCEERNLLSVAYKNVVGGQRAAWR VLSSIEQKSNEEGSEEKGPEVREYREKVETELQGVCDTVLGLLDSHLIKEAGDAESRVFY LKMKGDYYRYLAEVATGDDKKRIIDSARSAYQEAMDISKKEMPPTNPIRLGLALNFSVFH YEIANSPEEAISLAKTTFDEAMADLHTLSEDSYKDSTLIMQLLRDNLTLWTADNAGEEGG EAPQEPQS etc UniProt (example) The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. UniProt is a collaboration between the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). Across the three institutes close to 150 people are involved through different tasks such as database curation, software development and support. http://www.uniprot.org/help/about Example: 20,292 entries for taxonomy: "Homo sapiens (Human) [9606] AND reviewed:yes

1 PMF Search Parameters Review: Trypsin Proteolytic Enzyme and 0 Missed Cleave Site (#MC) modifications Example: Theoretical Trypsin Digest Peptide Mass List (truncated) for a Single Protein: Trypsin = Enzyme Missed Cleave Sites = 0 http://www.expasy.org/tools/peptide-mass.html

1 PMF Search Parameters: Trypsin Proteolytic Enzyme and 1 Missed Cleave Site MSO = Methionine Oxidation (amino acid modification) artif modifications modifications http://www.expasy.org/tools/peptide-mass.html

1 PMF Parameter: Peptide Mass Deviation (+/- m/z) High resolution measurement 631.345 +/- 0.006 Range: 631.339 631.351 3 Mouse proteins with Trypsin peptide in this range Low resolution measurement 631 +/- 1 Range: 630 632 197 Mouse proteins with Trypsin peptide in this range 630 631 632 633 m/z

1 Mass Measurement Error Calculation Error Expression Fractional Error Expression Multiply by: ppm / Theoretical value 1,000,000 % / Theoretical value 100 where Δ = Experimental (or observed) Theoretical m/z value Example: Experimental/observed value (i.e., the data acquired by the mass spectrometer) = 1480.107 m/z Theoretical value (calculated from periodic table, after the peak is identified) = 1480.028 m/z Delta (Δ) = 0.079 Error Expression Error Equation Error ppm (0.079/1480.028)*1,000,000 53 ppm % (0.079/1480.028)*100 0.0053% A USEFUL REFERENCE: Brenton AG, Godfrey AR. Accurate mass measurement: terminology and treatment of data. J Am Soc Mass Spectrom. 2010 Nov;21(11):1821-35.

1 MASCOT Peptide Mass Fingerprint Search http://www.matrixscience.com/

1 MASCOT PMF Search Parameter Page

1 PMF Search QUERY (Experimental Data) m/z Peptide 631.71 NDIAAK 650.71 ELGFQG 662.72 ASEDLK 708.81 TEAEMK 748.90 ALELFR 1272.44 LFTGHPETLEK 1379.69 HGTVVLTALGGILK 1607.81 VEADIAGHGQEVLIR 1817.01 GLSDGEWQQVLNVWGK 1855.06 GHHEAELKPLAQSHATK 3241.65 YLEFISDAIIHVLHSHPGNFGADAQGAMTK Search Score Compare QUERY to THEORETICAL PEPTIDE MASS LIST for each protein in the database Parameters: Enzyme Missed cleave site Amino acid mods Mass tolerance Probability-based MOWSE Score (often, the protein with highest number of peptide matches has the highest score)

1

1

1 PMF Search: Results Interpretation Is the protein ID experimentally rational? Does the MW of protein in search results match MW determined by SDS-PAGE? Does the pi of protein in search results match pi determined by 2D-PAGE? Perform MS/MS if no protein match

Three Strategies for Protein Inference from Peptide MS or MS/MS Data DATA INPUT: Peptide Mass List (MS1) Peptide MS/MS (MS2) Search TYPE: Peptide Mass Fingerprint (PMF) (historical) Precursor Mass-based 2 Sequencebased (de novo or mass tag) 1 3

2 Tandem MS: Precursor Mass-based Search Component Input (Data) Target Search Parameters Description Precursor Mass & Product Ion Masses Charge state Average or Monoisotopic Protein FASTA Sequence Database Proteolytic enzyme # Missed enzyme cleave sites Amino acid modifications Mass tolerances: Peptide/Precursor Mass Product Ions Masses

2 Tandem MS: Precursor Mass-based Search 1. Each MS/MS spectra has 2 data-rich Components 2. Database search for each MS/MS spectra has 2 STEPS

2 Tandem MS: Precursor Mass-based Search REVIEW: Each MS/MS Spectra has Two Data-Rich Components In te n s ity, c o u n ts Example: 612.83 m/z = Precursor m/z +TOF MS: Experiment 1, 35.067 min from bruce w 2-5 humjak3 tryp.wiff a=3.56742302613418490e-004, t0=4.31796646529983260e+001 3748 3600 3400 3200 3000 2800 2600 2400 2200 2000 1800 1600 1400 1200 1000 800 600 400 200 0 COMPONENT 1: Intact Peptide 612.83 613.33 613.83 Max. 3748.0 counts. 611.0 611.5 612.0 612.5 613.0 613.5 614.0 614.5 615.0 615.5 616.0 m/z, amu In te n s ity, c o u n ts +TOF Product (612.8): Experiment 2, 35.118 min from bruce w 2-5 humjak3 tryp.wiff a=3.56742302613418490e-004, t0=4.31796646529983260e+001 2116 2000 1800 1600 a1 1400 86.0972 1200 1000 800 600 400 200 0 COMPONENT 2: Product Ion Mass (m/z) values and Intensities after Peptide Fragmentation 136.0725 199.1716 a2b2 y2 307.1400 249.1534 b3 340.2637 y1 147.1153 y3 420.2225 277.1471 a3-nh3(2+) 289.1373 y5 635.3242 y4 y7(2+) a4 y8(2+) b4 a5 y6 722.3468 y7 885.3852 y8 998.4837 Max. 2116.0 counts. 100 200 300 400 500 600 700 800 900 1000 1100 1200 m/z, amu y9

Tandem MS: Precursor Mass-based Search STEP 1: Find Theoretical Peptides (from in silico Protein Digest) within user specified Precursor Mass Tolerance from MS1 spectrum Mass [M + H] +1 Protein GenBank ID# Peptide 1075.16 gi 10947876 SRNLGPSTSR 1074.22 gi 89109326 RASVDIGISR 1074.30 gi 50808505 MoxRTFMoxISR 1075.11 gi 37654254 QTYENYTR 1074.15 gi 30150130 REEMHESR 1074.24 gi 31542393 IDLVSMoxHSR 1074.26 gi 46402247 IFLPGHYAR 1074.24 gi 18875326 FMLPGDTHR 1074.16 gi 22713614 NMoxRQHDTR 1075.15 gi 20143916 EVPETKDTR PEPTIDE CANDIDATES 2 evaluate 1 st candidate (next slide) etc etc etc

2 Tandem MS: Precursor Mass-based Search STEP 2: Generate Theoretical Product Ion Table for Peptide Candidate SRNLGPSTSR Theoretical Product Ion Table evaluate 1 st candidate Residue a b b-h2o b-nh3 y y-h2o y-nh3 S, Ser 60.0 88.0 70.0 71.0 1074.6 1056.6 1057.5 R, Arg 216.1 244.1 226.1 227.1 987.5 969.5 970.5 N, Asn 330.2 358.2 340.2 341.2 831.4 813.4 814.4 L, Leu 443.3 471.3 453.3 454.2 717.4 699.4 700.4 G, Gly 500.3 528.3 510.3 511.3 604.3 586.3 587.3 P, Pro 597.3 625.3 607.3 608.3 547.3 529.3 530.3 S, Ser 684.4 712.4 694.4 695.3 450.2 432.2 433.2 T, Thr 785.4 813.4 795.4 796.4 363.2 345.2 346.2 S, Ser 872.5 900.5 882.4 883.4 262.2 244.1 245.1 R, Arg 1028.6 1056.6 1038.5 1039.5 175.1 157.1 158.1

2 Tandem MS: Precursor Mass-based Search Compare theoretical product ions to experimental product Ions from MS2 spectrum

Intensity 2 REVIEW: b-type fragment ions NH3+ A E P T I R COOH NH2 NH2 A E P T A E P NH2 A E NH2 A 72.0 201.1 298.1 399.2 m/z

Intensity 2 REVIEW: y-type fragment ions NH3+ A E P T I R COOH HOOC HOOC R I T P R I T HOOC R I HOOC R 174.1 287.2 388.2 485.2 m/z

Intensity 2 REVIEW: b- and y-type Fragment ions: PEPTIDE TANDEM MASS SPECTRUM NH3+ A E P T I R COOH b2 201.1 y3 388.2 b5 512.2 y2 287.2 b4 399.2 y4 485.2 b1 72.0 y1 174.1 b3 298.1 y5 614.2 m/z

2 Tandem MS: Precursor Mass-based Search SUMMARY- for Each MS/MS Spectrum: Generate theoretical product ion tables for all peptide candidates Compare theoretical product ions to experimental product Ions from MS2 spectrum Score all candidate peptides Rank peptides by Score

2 Tandem MS: Precursor Mass-based Search STEP 1 Find Theoretical Peptide Matches from in silico Protein Digest in the range: Experimental Precursor Mass +/- Mass Tolerance STEP 2 For each candidate peptide from Step 1: Compare Theoretical Product Ions (b & y, etc) to Experimental Product ions (data) STEP 3 SCORE: many software programs/ algorithms Peptide rank PROTEIN Report and Grouping: many variations

2 Tandem MS: Precursor Massbased search Software at UM Sequest X!Tandem MASCOT

Three Strategies for Protein Inference from Peptide MS or MS/MS Data DATA INPUT: Peptide Mass List (MS1) Peptide MS/MS (MS2) Search TYPE: Peptide Mass Fingerprint (PMF) (historical) Precursor Mass-based 2 Sequencebased (de novo or mass tag) 1 3

3 Sequence-based Searches: WHEN? Unsequenced genome (protein(s) of interest are not in the database) HOMOLOGY-based search Search for amino acid MUTATIONS Search for LARGE NUMBERS of Post Translational Modifications (PTM s)

3 Tandem MS: Sequence-based Search STEP 1: Obtain [full or partial] amino acid sequence string directly from the spectrum by de novo Sequencing de novo (Latin) "from the beginning"

3 Tandem MS: Sequence-based Search STEP 1: de novo sequencing: amino acid sequence is determined from delta mass values for a series of successive peptide b- or y-type product ions from a Product Ion (MS/MS) spectrum EXAMPLE de novo sequencing: L 113 V 99 A 71 D 115 L 113 E 129 delta masses between peaks correspond to amino acid residue masses See amino acid residue masses: http://www.ionsource.com/card/aatable/aatable.htm

3 Tandem MS: Sequence-based Search STEP 2: QUERY THE PROTEINS Example: AB Sciex Paragon Algorithm For each MS/MS spectrum, multiple, short sequences (taglets) are generated, scored, and matched against protein sequences. Sequence Temperature Values (STV) rank candidate amino acid sequence segments. STV is related to the number of taglet hits per region. Slide content: courtesy of Sean L. Seymour, AB Sciex

3 Tandem MS: Sequence-based Search STEP 3: Candidate peptide sequences within protein sequences are identified with Precursor mass AND sequence tag information Include: Enzyme specificity Mass tolerance Amino acid modifications Amino Acid mutations STEP 4: Candidates are Ranked and SCORED

3 Tandem MS: Sequence-based Search STEP 3b: Error-tolerant Sequence Tag Reconciliation Includes Amino Acid Mutation Search Component Combination: sequence tag & mass match Slide content: adapted from David L. Tabb, Vanderbilt University: Bioinformatics 2009, Baltimore MD

3 MS/MS Sequence-based Software at UM ProteinPilot (AB Sciex) PEAKS (BSI) Direct Tag (D Tabb Lab) MASCOT (Matrix Science LTD) Error Tolerant MS/MS Sequence Tag

3 MS/MS Sequence-based Software PepNovo (open source de novo) Frank et al, Anal Chem (2005) 77: 964-976 PEAKS Studio (Commercial de novo) Ma et al, RCMS (2003) 17: 2337:2342 Lutefisk (early de novo tool) Taylor et al, RCMS (1997) 11: 1067-1075 Direct Tag (open source tag inference) Tabb et al, J Proteome Res (2008) 7: 3838-3846 InsPecT (open source tag identifier) Tanner et al, Anal Chem (2005) 77: 4626-4639

Maximize Identifications with Multiple Software Programs Single Dataset analyzed with multiple software programs: Blue = number of peptide matches Red = number of protein matches

Maximize Identifications with Multiple Software Programs

Search Algorithm Conclusions 1) Several different search algorithm types exist 2) Each program provides useful results - no unified method exists 3) The same data searched using different algorithms may yield different results (scoring and ranking schemes are distinct)

Manual Inspection of Individual MS/MS Spectra Journal requirement (in special cases; see http://www.mcponline.org/site/misc/phialdelphiaguidelinesfinaldraft.pdf) Inspect SINGLE PEPTIDE HITS Site Localization of Post Translational Modifications (PTM s) Detection of unexpected gene products (alternative splice isoforms)

STEPS for Manual Inspection of MS/MS Spectra 1) Open RAW data; print spectrum (or, multiple zoomed-in regions of the spectrum) 2) Obtain list of Theoretical Fragment Ions (e.g., http://prospector.ucsf.edu/prospector/mshome.htm open MS Product tool 3) Label printout(s) of spectra with fragment ion types 4) DECISION: a. GOOD MATCH: all/most of the peaks (in the raw data) labeled with fragment ion types b. POOR MATCH: unassigned peaks in the raw data

Manual Inspection of MS/MS Spectra Example: RAW data labeled with Fragment Ion Names/Types IT4FYEQFSK IT4 (Fragment Ion Nomenclature Reference: P. Roepstorff & J. Fohlman (2005) Proposal for a Common Nomenclature for Sequence Ions in Mass Spectra of Peptides, Biomedical Spectrometry, 11(11), p 601