Center for Mass Spectrometry and Proteomics Phone (612) (612)

Outline Database search types Peptide Mass Fingerprint (PMF) Precursor mass-based Sequence tag Results comparison across programs Manual inspection of results Terminology Mass tolerance MS/MS search FASTA Tandem MS Precursor mass Product ion Theoretical mass Experimental mass Sequence tag de novo sequencing

Three Strategies for Protein Inference from Peptide MS or MS/MS Data DATA INPUT: Peptide Mass List (MS1) Peptide MS/MS (MS2) Search TYPE: Peptide Mass Fingerprint (PMF) (historical) Precursor Mass-based 2 Sequencebased (de novo or mass tag) 1 3

1 Peptide Mass Fingerprint (PMF) Historical: after 1D or 2D in-gel Digestion of single bands or spots Current applications: limited; decreasing utility Component Input (Data) Target Search Parameters Description Peptide Mass List Mass type (Mr or MH+) Average or Monoisotopic Protein FASTA Sequence Database Proteolytic enzyme # Missed enzyme cleave sites Amino acid modifications Mass tolerance

Absolute Intensity 1 Peptide Mass Fingerprint: INPUT UNKNOWN PROTEIN (amino acid sequence): GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTG HPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLT ALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLE FISDAIIHVLHSHPGNFGADAQGAMTKALELFRND IAAKYKELGFQG B) Acquire MALDI-TOF Mass Spectrum of peptide mixture A) In gel Trypsin digestion GLSDGEWQQVLNVWGK VEADIAGHGQEVLIR LFTGHPETLEK TEAEMK ASEDLK HGTVVLTALGGILK GHHEAELKPLAQSHATK YLEFISDAIIHVLHSHPGNFGADAQGAMTK ALELFR NDIAAK ELGFQG m/z C) Experimental Peptide Mass (or m/z) List m/z Peptide 631.71 NDIAAK 650.71 ELGFQG 662.72 ASEDLK 708.81 TEAEMK 748.90 ALELFR 1272.44 LFTGHPETLEK 1379.69 HGTVVLTALGGILK 1607.81 VEADIAGHGQEVLIR 1817.01 GLSDGEWQQVLNVWGK 1855.06 GHHEAELKPLAQSHATK 3241.65 YLEFISDAIIHVLHSHPGNFGADAQGAMTK

1 PMF TARGET: Protein FASTA Sequence Database >sp P31946 1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3 MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN >sp P31946-2 1433B_HUMAN Isoform Short of 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB MDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWR VISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLK MKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYE ILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAG EGEN >sp P62258 1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE EQNKEALQDVEDENQ >sp Q04917 1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH PE=1 SV=4 MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSW RVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESK VFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFS VFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDE EAGEGN >sp P61981 1433G_HUMAN 14-3-3 protein gamma OS=Homo sapiens GN=YWHAG PE=1 SV=2 MVDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSNEERNLLSVAYKNVVGARRSSW RVISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQDVLSLLDNYLIKNCSETQYESK VFYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHEISKEHMQPTHPIRLGLALNYS VFYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDD DGGEGNN >sp P31947 1433S_HUMAN 14-3-3 protein sigma OS=Homo sapiens GN=SFN PE=1 SV=1 MERASLIQKAKLAEQAERYEDMAAFMKGAVEKGEELSCEERNLLSVAYKNVVGGQRAAWR VLSSIEQKSNEEGSEEKGPEVREYREKVETELQGVCDTVLGLLDSHLIKEAGDAESRVFY LKMKGDYYRYLAEVATGDDKKRIIDSARSAYQEAMDISKKEMPPTNPIRLGLALNFSVFH YEIANSPEEAISLAKTTFDEAMADLHTLSEDSYKDSTLIMQLLRDNLTLWTADNAGEEGG EAPQEPQS etc UniProt (example) The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. UniProt is a collaboration between the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). Across the three institutes close to 150 people are involved through different tasks such as database curation, software development and support. http://www.uniprot.org/help/about Example: 20,292 entries for taxonomy: "Homo sapiens (Human) [9606] AND reviewed:yes

1 PMF Search Parameters Review: Trypsin Proteolytic Enzyme and 0 Missed Cleave Site (#MC) modifications Example: Theoretical Trypsin Digest Peptide Mass List (truncated) for a Single Protein: Trypsin = Enzyme Missed Cleave Sites = 0 http://www.expasy.org/tools/peptide-mass.html

1 PMF Search Parameters: Trypsin Proteolytic Enzyme and 1 Missed Cleave Site MSO = Methionine Oxidation (amino acid modification) artif modifications modifications http://www.expasy.org/tools/peptide-mass.html

1 PMF Parameter: Peptide Mass Deviation (+/- m/z) High resolution measurement 631.345 +/- 0.006 Range: 631.339 631.351 3 Mouse proteins with Trypsin peptide in this range Low resolution measurement 631 +/- 1 Range: 630 632 197 Mouse proteins with Trypsin peptide in this range 630 631 632 633 m/z

1 Mass Measurement Error Calculation Error Expression Fractional Error Expression Multiply by: ppm / Theoretical value 1,000,000 % / Theoretical value 100 where Δ = Experimental (or observed) Theoretical m/z value Example: Experimental/observed value (i.e., the data acquired by the mass spectrometer) = 1480.107 m/z Theoretical value (calculated from periodic table, after the peak is identified) = 1480.028 m/z Delta (Δ) = 0.079 Error Expression Error Equation Error ppm (0.079/1480.028)*1,000,000 53 ppm % (0.079/1480.028)*100 0.0053% A USEFUL REFERENCE: Brenton AG, Godfrey AR. Accurate mass measurement: terminology and treatment of data. J Am Soc Mass Spectrom. 2010 Nov;21(11):1821-35.

1 MASCOT Peptide Mass Fingerprint Search http://www.matrixscience.com/

1 MASCOT PMF Search Parameter Page

1 PMF Search QUERY (Experimental Data) m/z Peptide 631.71 NDIAAK 650.71 ELGFQG 662.72 ASEDLK 708.81 TEAEMK 748.90 ALELFR 1272.44 LFTGHPETLEK 1379.69 HGTVVLTALGGILK 1607.81 VEADIAGHGQEVLIR 1817.01 GLSDGEWQQVLNVWGK 1855.06 GHHEAELKPLAQSHATK 3241.65 YLEFISDAIIHVLHSHPGNFGADAQGAMTK Search Score Compare QUERY to THEORETICAL PEPTIDE MASS LIST for each protein in the database Parameters: Enzyme Missed cleave site Amino acid mods Mass tolerance Probability-based MOWSE Score (often, the protein with highest number of peptide matches has the highest score)

1 PMF Search: Results Interpretation Is the protein ID experimentally rational? Does the MW of protein in search results match MW determined by SDS-PAGE? Does the pi of protein in search results match pi determined by 2D-PAGE? Perform MS/MS if no protein match

2 Tandem MS: Precursor Mass-based Search Component Input (Data) Target Search Parameters Description Precursor Mass & Product Ion Masses Charge state Average or Monoisotopic Protein FASTA Sequence Database Proteolytic enzyme # Missed enzyme cleave sites Amino acid modifications Mass tolerances: Peptide/Precursor Mass Product Ions Masses

2 Tandem MS: Precursor Mass-based Search 1. Each MS/MS spectra has 2 data-rich Components 2. Database search for each MS/MS spectra has 2 STEPS

2 Tandem MS: Precursor Mass-based Search REVIEW: Each MS/MS Spectra has Two Data-Rich Components In te n s ity, c o u n ts Example: 612.83 m/z = Precursor m/z +TOF MS: Experiment 1, 35.067 min from bruce w 2-5 humjak3 tryp.wiff a=3.56742302613418490e-004, t0=4.31796646529983260e+001 3748 3600 3400 3200 3000 2800 2600 2400 2200 2000 1800 1600 1400 1200 1000 800 600 400 200 0 COMPONENT 1: Intact Peptide 612.83 613.33 613.83 Max. 3748.0 counts. 611.0 611.5 612.0 612.5 613.0 613.5 614.0 614.5 615.0 615.5 616.0 m/z, amu In te n s ity, c o u n ts +TOF Product (612.8): Experiment 2, 35.118 min from bruce w 2-5 humjak3 tryp.wiff a=3.56742302613418490e-004, t0=4.31796646529983260e+001 2116 2000 1800 1600 a1 1400 86.0972 1200 1000 800 600 400 200 0 COMPONENT 2: Product Ion Mass (m/z) values and Intensities after Peptide Fragmentation 136.0725 199.1716 a2b2 y2 307.1400 249.1534 b3 340.2637 y1 147.1153 y3 420.2225 277.1471 a3-nh3(2+) 289.1373 y5 635.3242 y4 y7(2+) a4 y8(2+) b4 a5 y6 722.3468 y7 885.3852 y8 998.4837 Max. 2116.0 counts. 100 200 300 400 500 600 700 800 900 1000 1100 1200 m/z, amu y9

Tandem MS: Precursor Mass-based Search STEP 1: Find Theoretical Peptides (from in silico Protein Digest) within user specified Precursor Mass Tolerance from MS1 spectrum Mass [M + H] +1 Protein GenBank ID# Peptide 1075.16 gi 10947876 SRNLGPSTSR 1074.22 gi 89109326 RASVDIGISR 1074.30 gi 50808505 MoxRTFMoxISR 1075.11 gi 37654254 QTYENYTR 1074.15 gi 30150130 REEMHESR 1074.24 gi 31542393 IDLVSMoxHSR 1074.26 gi 46402247 IFLPGHYAR 1074.24 gi 18875326 FMLPGDTHR 1074.16 gi 22713614 NMoxRQHDTR 1075.15 gi 20143916 EVPETKDTR PEPTIDE CANDIDATES 2 evaluate 1 st candidate (next slide) etc etc etc

2 Tandem MS: Precursor Mass-based Search STEP 2: Generate Theoretical Product Ion Table for Peptide Candidate SRNLGPSTSR Theoretical Product Ion Table evaluate 1 st candidate Residue a b b-h2o b-nh3 y y-h2o y-nh3 S, Ser 60.0 88.0 70.0 71.0 1074.6 1056.6 1057.5 R, Arg 216.1 244.1 226.1 227.1 987.5 969.5 970.5 N, Asn 330.2 358.2 340.2 341.2 831.4 813.4 814.4 L, Leu 443.3 471.3 453.3 454.2 717.4 699.4 700.4 G, Gly 500.3 528.3 510.3 511.3 604.3 586.3 587.3 P, Pro 597.3 625.3 607.3 608.3 547.3 529.3 530.3 S, Ser 684.4 712.4 694.4 695.3 450.2 432.2 433.2 T, Thr 785.4 813.4 795.4 796.4 363.2 345.2 346.2 S, Ser 872.5 900.5 882.4 883.4 262.2 244.1 245.1 R, Arg 1028.6 1056.6 1038.5 1039.5 175.1 157.1 158.1

2 Tandem MS: Precursor Mass-based Search Compare theoretical product ions to experimental product Ions from MS2 spectrum

Intensity 2 REVIEW: b-type fragment ions NH3+ A E P T I R COOH NH2 NH2 A E P T A E P NH2 A E NH2 A 72.0 201.1 298.1 399.2 m/z

Intensity 2 REVIEW: y-type fragment ions NH3+ A E P T I R COOH HOOC HOOC R I T P R I T HOOC R I HOOC R 174.1 287.2 388.2 485.2 m/z

Intensity 2 REVIEW: b- and y-type Fragment ions: PEPTIDE TANDEM MASS SPECTRUM NH3+ A E P T I R COOH b2 201.1 y3 388.2 b5 512.2 y2 287.2 b4 399.2 y4 485.2 b1 72.0 y1 174.1 b3 298.1 y5 614.2 m/z

2 Tandem MS: Precursor Mass-based Search SUMMARY- for Each MS/MS Spectrum: Generate theoretical product ion tables for all peptide candidates Compare theoretical product ions to experimental product Ions from MS2 spectrum Score all candidate peptides Rank peptides by Score

2 Tandem MS: Precursor Mass-based Search STEP 1 Find Theoretical Peptide Matches from in silico Protein Digest in the range: Experimental Precursor Mass +/- Mass Tolerance STEP 2 For each candidate peptide from Step 1: Compare Theoretical Product Ions (b & y, etc) to Experimental Product ions (data) STEP 3 SCORE: many software programs/ algorithms Peptide rank PROTEIN Report and Grouping: many variations

2 Tandem MS: Precursor Massbased search Software at UM Sequest X!Tandem MASCOT

3 Sequence-based Searches: WHEN? Unsequenced genome (protein(s) of interest are not in the database) HOMOLOGY-based search Search for amino acid MUTATIONS Search for LARGE NUMBERS of Post Translational Modifications (PTM s)

3 Tandem MS: Sequence-based Search STEP 1: Obtain [full or partial] amino acid sequence string directly from the spectrum by de novo Sequencing de novo (Latin) "from the beginning"

3 Tandem MS: Sequence-based Search STEP 1: de novo sequencing: amino acid sequence is determined from delta mass values for a series of successive peptide b- or y-type product ions from a Product Ion (MS/MS) spectrum EXAMPLE de novo sequencing: L 113 V 99 A 71 D 115 L 113 E 129 delta masses between peaks correspond to amino acid residue masses See amino acid residue masses: http://www.ionsource.com/card/aatable/aatable.htm

3 Tandem MS: Sequence-based Search STEP 2: QUERY THE PROTEINS Example: AB Sciex Paragon Algorithm For each MS/MS spectrum, multiple, short sequences (taglets) are generated, scored, and matched against protein sequences. Sequence Temperature Values (STV) rank candidate amino acid sequence segments. STV is related to the number of taglet hits per region. Slide content: courtesy of Sean L. Seymour, AB Sciex

3 Tandem MS: Sequence-based Search STEP 3: Candidate peptide sequences within protein sequences are identified with Precursor mass AND sequence tag information Include: Enzyme specificity Mass tolerance Amino acid modifications Amino Acid mutations STEP 4: Candidates are Ranked and SCORED

3 Tandem MS: Sequence-based Search STEP 3b: Error-tolerant Sequence Tag Reconciliation Includes Amino Acid Mutation Search Component Combination: sequence tag & mass match Slide content: adapted from David L. Tabb, Vanderbilt University: Bioinformatics 2009, Baltimore MD

3 MS/MS Sequence-based Software at UM ProteinPilot (AB Sciex) PEAKS (BSI) Direct Tag (D Tabb Lab) MASCOT (Matrix Science LTD) Error Tolerant MS/MS Sequence Tag

3 MS/MS Sequence-based Software PepNovo (open source de novo) Frank et al, Anal Chem (2005) 77: 964-976 PEAKS Studio (Commercial de novo) Ma et al, RCMS (2003) 17: 2337:2342 Lutefisk (early de novo tool) Taylor et al, RCMS (1997) 11: 1067-1075 Direct Tag (open source tag inference) Tabb et al, J Proteome Res (2008) 7: 3838-3846 InsPecT (open source tag identifier) Tanner et al, Anal Chem (2005) 77: 4626-4639

Maximize Identifications with Multiple Software Programs Single Dataset analyzed with multiple software programs: Blue = number of peptide matches Red = number of protein matches

Maximize Identifications with Multiple Software Programs

Search Algorithm Conclusions 1) Several different search algorithm types exist 2) Each program provides useful results - no unified method exists 3) The same data searched using different algorithms may yield different results (scoring and ranking schemes are distinct)

Manual Inspection of Individual MS/MS Spectra Journal requirement (in special cases; see http://www.mcponline.org/site/misc/phialdelphiaguidelinesfinaldraft.pdf) Inspect SINGLE PEPTIDE HITS Site Localization of Post Translational Modifications (PTM s) Detection of unexpected gene products (alternative splice isoforms)

STEPS for Manual Inspection of MS/MS Spectra 1) Open RAW data; print spectrum (or, multiple zoomed-in regions of the spectrum) 2) Obtain list of Theoretical Fragment Ions (e.g., http://prospector.ucsf.edu/prospector/mshome.htm open MS Product tool 3) Label printout(s) of spectra with fragment ion types 4) DECISION: a. GOOD MATCH: all/most of the peaks (in the raw data) labeled with fragment ion types b. POOR MATCH: unassigned peaks in the raw data

Manual Inspection of MS/MS Spectra Example: RAW data labeled with Fragment Ion Names/Types IT4FYEQFSK IT4 (Fragment Ion Nomenclature Reference: P. Roepstorff & J. Fohlman (2005) Proposal for a Common Nomenclature for Sequence Ions in Mass Spectra of Peptides, Biomedical Spectrometry, 11(11), p 601