Supplementary Tables. Note: Open-pFind is embedded as the default open search workflow of the pfind tool. Nature Biotechnology: doi: /nbt.

Size: px
Start display at page:

Download "Supplementary Tables. Note: Open-pFind is embedded as the default open search workflow of the pfind tool. Nature Biotechnology: doi: /nbt."

Transcription

1 Supplementary Tables Supplementary Table 1. Detailed information for the six datasets used in this study Dataset Mass spectrometer # Raw files # MS2 scans Reference Dong-Ecoli-QE Q Exactive 5 202,452 / Xu-Yeast-QEHF Q Exactive HF ,301 / Mann-Human-Velos LTQ Orbitrap Velos 3 64,112 [1] Gygi-Human-QE Q Exactive 24 1,121,149 [2] Mann-Mouse-QEHF Q Exactive HF 4 746,116 [3] Pandey-Human-Elite a LTQ Orbitrap Elite ,913 [4] Note: a Only the 24 raw files whose names begin with Adult_CD8Tcells_Gel_Elite were chosen. In the entrapment analysis shown in Supplementary Fig. 8, one RAW file was used for each of the four published datasets, namely, _Velos2_AnMi_QC_wt_HCD_iso4_swG for Mann-Human-Velos, b1906_293t_proteinid_01a_qe3_ for Gygi-Human-QE, _QEp8_KiSh_SA_Cerebellum_P05_Singleshot1 for Mann-Mouse-QEHF, and Adult_CD8Tcells_Gel_Elite_44_f01 for Pandey-Human-Elite. Supplementary Table 2. The eight search engines used in this study Search engine Version Open search Open-pFind 1.0 PEAKS 7.5 MODa 1.23 MSFragger v pfind Comet MS-GF+ v10072 Byonic 2.10 Note: Open-pFind is embedded as the default open search workflow of the pfind tool. 1

2 Items Database Supplementary Table 3. Parameters for database searches Settings Target + Decoy a Enzyme Trypsin Digestion Fully specific for restricted engines and MSFragger Non-Specific for Open-pFind, MODa and PEAKS Max. missed cleavage sites 3 Mass tolerance of precursor ions ± 20 ppm ± 20 ppm (± 0.02 Da if the ppm unit is not supported Mass tolerance of fragment ions for the search engines, e.g., PEAKS) Fixed: carbamidomethylation (C) Modifications for restricted search engines Variable: oxidation (M), Gln pyro-glu (N-termini of peptides) and acetylation (N-termini of proteins) Open-pFind and MODa: no modifications Modifications for open search engines MSFragger and PEAKS: the same modifications as restricted search engines b Max. modifications per peptide 4 Note: a The human protein database was downloaded from UniProt ( ) for Mann-Human-Velos, Gygi-Human-QE and Pandey-Human-Elite. The mouse protein database was downloaded from UniProt ( ) for Mann-Mouse-QEHF. Both reviewed and unreviewed proteins were used in this study by default. The E. coli protein database for the K-12 substrain MG1655 was downloaded from NCBI on for Dong-Ecoli-QE. The six-frame-translated database was used as the target database for Xu-Yeast-QEHF (the detailed information of database generation is described in Online Methods). b For PEAKS, the modifications were set as those for restricted search engines in the PEAKS DB step, and then the built-in modification list was used in PEAKS PTM for modification detection. Supplementary Table 4. The average number of protein-unique peptides per protein in the proteins co-identified by the eight search engines for the Dong-Ecoli-QE dataset Search engine # Protein-unique peptides per protein Open-pFind 17.3 PEAKS 15.3 MSFragger 14.0 MODa 10.1 Byonic 9.6 pfind 9.2 Comet 9.0 MS-GF+ 8.9 Note: A protein-unique peptide is defined by its amino acid sequence and mapped to only one protein in the given database. 2

3 Supplementary Table 5. Real search times (in min.) of the eight search engines for the six datasets pfind Byonic MS-GF+ Comet MSFragger PEAKS MODa Open-pFind a Xu-Yeast-QEHF ,269 4, (158) Dong-Ecoli-QE (32) Mann-Human-Velos (78) Mann-Mouse-QEHF ,178 52, (1,210) Gygi-Human-QE ,013 27, (903) Pandey-Human-Elite ,880 12, (414) Note: All MS/MS data were analyzed using a standard desktop computer (8-core 2.90 GHz and 32-GB RAM), in which six threads were specified for Open-pFind, MSFragger, pfind, Comet, MS-GF+ and Byonic (Multicore: Normal). MODa performed single-thread searches because multiple threading was not supported in this version. PEAKS used its built-in strategy (about 6 8 threads by observation from the task manager of the operating system). Multicore: Normal setting is used for Byonic. a The single-threaded search time is shown in parentheses. 3

4 Supplementary Table 6. The analysis of a single LC-MS/MS run consisting of 41,820 MS/MS spectra in the Gygi-Human-QE dataset Fully Specific Semi-Specific Non-Specific Time # PSM Time # PSM Time # PSM MODa , , ,748 PEAKS , , ,194 MSFragger 16 22, ,239 2,466 18,898 Open-pFind (Default) Open-pFind (Unimod-2) Open-pFind (Blind) 8 36, , , , , , , , ,304 Note: The raw file is named as b1906_293t_proteinid_01a_qe3_ raw (PXD in ProteomeXchange). The three workflows, namely Default, Unimod-2, and Blind, were introduced in Online Methods. The running time is measured in minutes. Supplementary Table 7. The results of three open search engines with the T. tengcongensis dataset Fully specific digestion Non-specific digestion Time (min.) # PSM Time (min.) # PSM PEAKS , ,521 MODa 33 26, ,941 MSFragger 8 38, ,794 Open-pFind 4 48, ,829 Note: The dataset contains 113,531 tandem mass spectra, which has been proposed by Chi et. al. in 2015 ( referred to as TTE-65 in this manuscript), and ~38.5% of the total peptides are semi- or non-specifically digested. The T. tengcongensis database was downloaded from UniProt ( ), containing both reviewed and unreviewed proteins. The other parameters were the same as those for the other analyses in this study. 4

5 Supplementary Table 8. The running time and the number of identified PSMs with different tag lengths for the four published datasets Time a (Relative change b ) Identified PSMs (Relative change) 3-tag 7,758 (647.4%) 74,772 ( 0.6%) Mann-Human-Velos 4-tag 2,603 (150.7%) 75,032 ( 0.2%) 5-tag 1,038 (0.0%) 75,203 (0.0%) 6-tag 602 ( 42.0%) 74,516 ( 0.9%) 3-tag 117,444 (848.9%) 985,916 (1.1%) Gygi-Human-QE 4-tag 34,228 (176.5%) 990,940 (1.6%) 5-tag 12,377 (0.0%) 975,629 (0.0%) 6-tag 7,008 ( 43.4%) 939,966 ( 3.7%) 3-tag 152,945 (911.7%) 683,530 ( 0.2%) Mann-Mouse-QEHF 4-tag 46,262 (206.0%) 687,070 (0.3%) 5-tag 15,117 (0.0%) 684,977 (0.0%) 6-tag 8,992 ( 40.5%) 679,067 ( 0.9%) 3-tag 53,676 (931.4%) 388,482 (0.6%) Pandey-Human-Elite 4-tag 15,170 (191.5%) 388,934 (0.7%) 5-tag 5,204 (0.0%) 386,280 (0.0%) 6-tag 3,411 ( 34.5%) 380,884 ( 1.4%) Note: a The running time is measured in seconds. b The relative changes are calculated based on the 5-tag results (in italics) which is used as the default setting in the Open-pFind workflow, e.g., for the Mann-Human-Velos dataset, if 4-tag is used in the open search step, the running time is 2,603 seconds, which is 150.7% more than that of the 5-tag database search. Supplementary Table 9. The tag frequency and tag-index storage space with different tag lengths Tag length Average frequency Storage space (MB) Note: the frequency of a tag denotes the number of positions in the protein database that exactly mapped by this tag. For example, all 6-length tags appeared 5.2 times in the database on average. Reviewed and unreviewed human proteins (152,493 in total) were downloaded from UniProt and used in this study. 5

6 Supplementary Table 10. The number of identified proteins and genes in Kim data Min. pep. FDR (%) Olfactory receptor Average coverage (%) Low coverage (< 10%) proteins Proteins Genes All pep. Unique pep. All pep. Unique pep. 1 19, , ,282 8, , , ,564 3, , , ,231 1, , , , , , , , , , , , , , , Note: Min. Pep. Denotes the minimum number of protein-unique peptides required for supporting the identification of one protein (2 by default in the main text). The coverage of one protein is defined as the fraction of amino acids supported by at least one peptide among all amino acids in this protein sequence. In terms of the protein coverage calculation, All pep. means that all peptides were used to calculate the protein coverage, and Unique pep. means that only the protein-unique peptides were used to calculate the protein coverage. Only peptides with lengths equal to or greater than 9 are considered in this analysis. 6

7 Supplementary Notes Supplementary Note 1 Using the metabolic labeling technique to estimate the error rates of search engines. NaN ratios can be used to estimate the error rates of different engines independent of the target-decoy strategy. The error rate of one search engine is defined as the fraction of incorrect PSMs in all PSMs reported by this engine. First, we investigated the relationship between decoy PSMs and NaN-ratio PSMs based on the Open-pFind results obtained from the Dong-Ecoli-QE dataset. Fig. S1 shows the increase in the number of decoys and NaN-ratio PSMs along with the numbers of target PSMs (all PSMs were sorted in ascending order of their scores). The trends of the three curves were quite consistent, and the tails (where nearly all PSMs were incorrect) showed that the proportions of both decoy and NaN-ratio PSMs were stable. Fig. S1. The relationship between the number of target PSMs and the number of PSMs from the decoy database (green) or with NaN ratios of 15 N/ 14 N (red) or 13 C/ 12 C (blue) at each score threshold in the Dong-Ecoli-QE dataset. Initially, all PSMs are sorted in ascending order by their scores (e.g., the best PSM ranked at the first place). The subplot shows the linear property of the tails of the three curves. 7

8 The number of data points (N) used for determining the R 2 values is 53,225 (located at the tail of the curves after 180,000). Therefore, the percentage of NaN-ratio PSMs is useful for estimating the error rates of the results of metabolically-labeled datasets, which is similar to but independent of the traditional target-decoy strategy. Given M as the number of total PSMs and N as the number of NaN-ratio PSMs, we get the equation MM ee rr 1 + MM (1 ee) rr 2 = NN, 1) where e denotes the error rate to be estimated, r 1 denotes the percentage of NaN-ratio PSMs in incorrect matches (e.g., target PSMs distributed at the tail of the curves in Fig. S1) and r 2 denotes the percentage of NaN-ratio PSMs in correct matches. r 1 is simply calculated using the linear least-squares method, and r 2 is estimated based on the intersection of the results of different engines because a PSM is more likely to be correct if it is consistently reported by multiple search engines, resulting in a lower probability of being a NaN-ratio PSM (Fig. S2). In this study, the intersecting results of all eight search engines were used to estimate the value of r 2. Finally, the error rate e is estimated using the following formula: ee = NN MM rr 2 MM (rr 1 rr 2 ), 2) and the precision of the given result set is equal to 1 e. This formula also shows that if r 1 and r 2 are correctly estimated based on the same dataset, then a smaller percentage of NaN-ratio results indicates a lower error rate, i.e., a higher precision. 8

9 Fig. S2. The proportions of NaN-ratio PSMs distributed in all of the possible intersections of the eight result sets from Open-pFind, PEAKS, MODa, MSFragger, MS-GF+, Byonic, Comet and pfind. The number of intersections (N) for each boxplot is 8, 28, 56, 70, 56, 28, 8, 1. For example, the number of intersections from any three result sets is 8 = 56. Box-plot elements: center line, median; box limits, 3 first and third quartile (Q1 and Q3); whiskers, from Q1 1.5 IQR to Q3+1.5 IQR; dots, outlier data points. 9

10 Fig. S3. Comparison of estimated precision of consistently and separately identified PSMs between every two search engines using the Dong-Ecoli-QE dataset. 15 N- and 13 C-labeled peptides are used for estimation, and the final precision is calculated from the average of the two estimates for the same resulting PSMs. Each decimal denotes the estimated precision of the consistently or separately identified PSMs. a) Only the PSMs with common modification types (the four that are specified in the restricted search engines) are considered. b) All PSMs are considered. 10

11 In the Dong-Ecoli-QE dataset, the newly estimated precision of the identified PSMs varied within % for different engines when considering only the peptides in the restricted search space (Fig. S3). For the separately identified results, the estimated precision of Open-pFind remained close to 99%, which was significantly higher in comparison with the other search engines. Generally, if considering only peptides with no or only common modifications, all open search engines reported more accurate results than those obtained with the restricted engines because the peptides from the restricted search space survived in a significantly larger space containing a huge number of competing peptide candidates. However, if all identified peptides were considered, the precision of the open search engines decreased to varying degrees. Open-pFind remained at a high global precision of 98.9%, while the precision of the other three open search engines dropped to 93.5% for the best, or to 86.6% for the worst. The potential of the metabolic labeling approach is worth being further explored. 11

12 Supplementary Note 2 Using the metabolic labeling technique to examine the search engine results. The metabolic labeling technique is helpful in revealing why spectra are misidentified via different search engines and improving search engine precision. Generally, a spectrum with a NaN-ratio peptide reported by one search engine may be identified as a different normal-ratio peptide by another search engine. As described above, the normal-ratio peptide is more likely to be a correct identification. Thus, for the former search engine, this could be used to optimize the scoring function. For all NaN-ratio PSMs from Open-pFind, only less than 10% were revived by other engines, i.e., identified as normal-ratio peptides (Fig. S4). In contrast, Open-pFind revived ~40% of NaN-ratio PSMs reported by other search engines. Fig. S4. The proportions of NaN-ratio PSMs obtained from one engine but revived by others in Dong-Ecoli-QE dataset. a) Comparison between every two search engines. Each decimal denotes the percentage of PSMs revived by the search engine in the row (leftmost) for the total NaN-ratio PSMs from the search engine in the column (topmost). Only peptides with common modifications are considered. b) Similar to a), but all PSMs including all types of modifications are considered. The 15 N-labeled peptides and the unlabeled (common) peptides are used to calculate the quantitative values. 12

13 Table S1. The fraction of spectra assigned with overlapping peptides among the revived spectra from different engines in the Dong-Ecoli-QE dataset Search engine # Total peptides # Overlapping # Overlapping peptides / (from revived spectra) peptides a # Total peptides (%) b MSFragger 4,161 3, PEAKS 1,221 1, MODa 3,277 2, pfind MS-GF Comet Byonic Note: a Two peptides are called overlapping peptides if one peptide sequence is the substring of the other one. For example, GCEHVAK and C(+carbamidomethyl)EHVAK are overlapping peptides. b The fraction of overlapping peptides in all peptide reported by each search engine. For example, a total of 3,669 spectra identified by MSFragger were assigned with overlapping peptides of those reported by Open-pFind, which accounted for 88.2% of the total spectra identified by MSFragger. For the open search engines, Open-pFind reported an overlapping peptide to the one reported by the other engine for ~90% of the revived spectra (Table S1), that is, for two peptide sequences identified by Open-pFind and the other engine, one sequence is the substring of the other one (e.g., GCEHVAK/C(+carbamidomethyl)EHVAK is a pair of overlapping peptides, or we can say that each one is an overlapping peptide to the other). In other words, these peptide sequences reported by the other open search engines were partially correct, while Open-pFind confirmed the exact termini of the peptides and modification types, as well as the precise precursor information. For example, Open-pFind reported a C-terminal-specific peptide carbamyl-gaaggigqalalllk with an N-terminal carbamylation (P 1 ) for one spectrum (Fig. S5a), while MSFragger reported an overlapping tryptic peptide VAVLGAAGGLGQALALLLK with a mass shift of Da (P 2 ). However, the actual mass difference of these two peptides (P 2 P 1 ) was Da. This result implied that the mass shift of Da reported by MSFragger did not represent a real modification because a ~2 Da mass difference existed between the initially exported precursor ion and the actual one confirmed by Open-pFind (Fig. S5b). This finding also demonstrated that exact precursor ions were very important for the confirmation of modification types. 13

14 Fig. S5. Two example spectra showing the effects of the metabolic labeling technique to distinguish the correct PSMs. +, o and x denote the monoisotopic m/z s of the unlabeled, 15N- and 13C-labeled precursor ions, respectively. The first example is from 3,669 similar results in the result comparison between Open-pFind and MSFragger, and the second example is from 811 similar results in the result comparison between Open-pFind and MS-GF+. a) Ecoli-1to1to1-un-C13-N15-60mM dta, which is identified by Open-pFind as a semi-tryptic peptide, GAAGGIGQALALLLK, with a carbamylation at the N-terminus (m/z = ). MSFragger reported another peptide, VAVLGAAGGLGQALALLLK (m/z = , Hyperscore= ), with few b-ions matched. If the precursor ion m/z was changed to for MSFragger (the same to that used in Open-pFind) and semi-tryptic peptides were allowed to search against, a new peptide GAAGGLGQALALLLK was reported with a mass shift of Da (The monoisotopic mass of carbamylation), whose Hyperscore was b) The MS1 information corresponding to the PSM shown in a). c) Ecoli-1to1to1-un-C13-N15-30mM dta, which is identified by Open-pFind as a peptide, ALTEANGDIELAIENMR, with a deamidation of N at the 6 th position. d) The same spectrum as c), which is identified by Comet and MS-GF+ as a peptide, ALTEANGDIELAIENMR, without any modifications. 14

15 e) The same spectrum as c), which is identified by Byonic as a peptide, ELGDADHGLNMNRGFSK, without any modifications. f) The MS1 information corresponding to the PSMs shown in c)-e). In terms of the restricted search engines, over 90% of revived peptides reported by MS-GF+ and Comet were partially correct, which was similar to the behavior of the open search engines (Table S1). However, this number was lower for Byonic and pfind. Byonic adopted a different protein FDR control strategy that a few low-quality PSMs from reliable proteins might be reported (Online Methods). Another example shows the differences between Open-pFind and the restricted search engines (Fig. S5c-e). For the same spectrum, Open-pFind reported a tryptic peptide with a deamidation, while MS-GF+ and Comet reported the unmodified form of this peptide, which obviously matched fewer fragment ions. Byonic reported a completely different peptide, which matched few peaks in the spectrum. The isotopic envelopes of the unlabeled peptide reported by Open-pFind, as well as the corresponding 15 N- and 13 C-labeled forms shown in MS1, matched the theoretical values precisely. In contrast, the monoisotopic precursor ions of the other two identifications had larger mass deviations, which resulted in invalid quantitation values (Fig. S5f). This example indicated again that peptides reported by Open-pFind were more accurate, and more importantly, the metabolic labeling technique is extremely helpful when distinguishing correct individual PSMs, which will facilitate the improved design of search engines. 15

16 Supplementary Note 3 Analysis based on the entrapment strategy showed the robustness of the design of Open-pFind. To analyze four published datasets, two types of entrapment databases were downloaded from the UniProt database and then used in this study: a) a small database of the reviewed proteins of Arabidopsis thaliana (8.7 MB, 15,423 protein sequences) and b) a large database of the reviewed proteins of all organisms (261.8 MB, 555,100 protein sequences). The entrapment databases were appended to the original database files, respectively. The other database search parameters were the same as those shown in Supplementary Table 3. Intuitively, when the entrapment database is considered in the database search, the identification rate should decrease because more random peptide candidates are involved in the search space, but few of them are the answers to any spectra. Generally, the decrease was more remarkable when a larger entrapment database was considered (Fig. S6). The Open-pFind identification rate was more stable in both situations than that of pfind. For example, the average decrease in the identification rates of Open-pFind and pfind was 1.6 and 4.2, respectively (Fig. S6b). The reason was that Open-pFind adopted a two-step workflow and the proteins to be retrieved in the restricted search were automatically learned in the previous open search step, so that most random peptide candidates that potentially interfere with the correct candidates were eliminated at this time. Furthermore, for all PSMs reported by Open-pFind that matched with the entrapment sequences, only less than 5% of them were revived by pfind, i.e., pfind identified the sequences in the original database for those spectra; however, the corresponding pfind percentages varied from 20% to 60% (Fig. S7). This phenomenon proved again that Open-pFind reported more accurate peptides that matched the authentic protein sequences rather than the entrapment sequences, although the same FDR threshold was controlled. 16

17 Fig. S6. Decreased identification rates caused by the entrapment strategy for the four datasets. a) Proteins from Arabidopsis thaliana were considered the entrapment database. b) Proteins from all organisms recorded in UniProt were considered the entrapment database. 17

18 Fig. S7. Open-pFind revived more spectra than pfind. The orange curves denote the proportion of PSMs from the entrapment database. a) Proteins from Arabidopsis thaliana were considered the entrapment database. b) Proteins from all organisms recorded in UniProt were considered the entrapment database. 18

19 We also used the entrapment strategy to evaluate the precision of search engines with the Dong-Ecoli-QE dataset (the reviewed human database downloaded from UniProt was used as the entrapment database), and the performance of Open-pFind was similar to that of the four large-scale datasets. When searching against the target and entrapment databases, Open-pFind reported the highest numbers of PSMs with the smallest proportions of those matched with the entrapment proteins (Fig. S8a). Similar as the analysis shown above, less than 10% of entrapment PSMs from Open-pFind were revived by the other engines, while 22 56% of entrapment PSMs from other engines were revived by Open-pFind (Fig. S8b). Fig. S8. Entrapment analysis of the Dong-Ecoli-QE dataset. a) The number of identified PSMs (the blue bars, including PSMs from both original and entrapment protein databases) and the percentage of PSMs from the entrapment database (the orange curve). b) The number and proportion of PSMs identified with entrapment peptides from one engine and revived by the other engine. For example, 359 entrapment PSMs were identified by PEAKS and revived by Open-pFind, which accounted for 49.8% of the total entrapment PSMs identified by PEAKS. 19

20 Supplementary Note 4 Nearly 100% of high-quality spectra in the four published datasets are identified within a comprehensive search space. We also investigated why a few spectra remained uninterpretable for Open-pFind. First, spectra are classified according to the lengths of their longest tags, which are treated as a feature related to spectral quality. For example, a 0-length tag indicates that no mass difference from any two peaks is equal to the mass of any amino acid residue within a given fragment ion tolerance. A spectrum with a longer tag meant that it was more likely to have been formed by a real peptide because more fragmentation information was provided. Generally, the identification rates of spectra with longer tags were higher for all engines (Fig. S9). For all four datasets, the identification rate of Open-pFind was always greater than 90% and even close to 100% for spectra with tags longer than ten, suggesting that the search space of Open-pFind is close to complete for routine MS/MS data analysis. Additionally, the scoring scheme of Open-pFind effectively distinguishes correct peptides from the random peptides, even in such an ultra-large search space. The identification rates of Byonic sharply decreased when spectra with longer tags were considered in the Mann-Mouse-QEHF dataset (Fig. S9c), likely because more large-mass peptides were present in this dataset, and their precursor ions were not accurately exported. Among all PSMs identified via Open-pFind in this dataset, 55.0% of their precursor ions were larger than 1,500 Da, of which only 50.1% were initially exported by the vendor s software. However, in the other datasets, the proportion of precursor ions larger than 1,500 Da was markedly smaller, for example, only 38.8% for the Pandey-Human-Elite dataset, of which 82.1% were extracted initially by the vendor s software. We also tested pfind using the precursor ions extracted by the vendor software rather than pparse, and the distribution of identification rates was similar to that of Byonic (Fig. S10), which again proved that extracting accurate precursor ions was very important for search engine design. 20

21 Fig. S9. Analyses of the unidentified spectra with different maximum tag lengths in the four datasets. The curves denote the identification rates of the spectra with different maximum tag lengths, and the histograms denote the distribution of the number of the total spectra at each tag length. Fig. S10. The distribution of the identification rates of Byonic and pfind at different maximum tag lengths extracted from the spectra. Two modes are adopted for pfind, and the only difference is whether pparse is used to calibrate the precursor ions. 21

22 Supplementary Note 5 Comprehensive analysis of the Kim data. The average identification rate was 62.5% for all 85 samples, and over 70% spectra were identified for the in-gel digested samples analyzed on an LTQ Orbitrap Velos (Fig. S11a). The results obtained with Open-pFind demonstrated that the characteristics of MS/MS data vary according to different methods for sample preparation and LC-MS/MS. In terms of modifications, although several common modifications, e.g., carbamidomethylation, oxidation and Gln pyro-glu, were always abundant in all datasets, many unexpected modifications still appeared in only one or two types of datasets (Fig. S11b). For example, propionamides of cysteines were hardly detected in the brplc fractionation samples but appeared as one of the most abundant modifications for cysteines in all peptides from in-gel digested samples (Supplementary Data 3), which was consistent with a previous study by Sechi et al. 5. On the other hand, the percentages of fully tryptic peptides were stable among the four types of datasets with different experimental conditions (97 99% concluded from Fig. S11c). In terms of co-eluting peptide identification, LTQ Orbitrap Elite tended to produce more mixed spectra than LTQ Orbitrap Velos, likely due to its higher sensitivity, allowing less-abundant peptides to be detected and identified via Open-pFind (Fig. S11d). The different characteristics of these datasets again proved that specifying an appropriate search space for each individual dataset based on expert experience is always difficult, and uniformly considering a comprehensive search space for different experimental conditions is essential for today s search engines. On the other hand, biological modifications and mutations were effectively discovered by Open-pFind. For example, Laminin subunit gamma-1 was identified by different types of peptides, all of which were supported by over ten PSMs (Fig. S11e). The N-terminal cleavage site of QAAMDECTDEGGRPQR was confirmed by the signal peptide recorded in UniProt. In addition, two amino acid mutations were discovered by Open-pFind, and one of them, the R1121Q, was verified previously (rs20559 in dbsnp 6 ). Identification results from the extended search space were also valuable for other biological discoveries. For example, a total of 9,559 semi-tryptic peptides were identified as being located in the 22

23 N-terminal regions of proteins (the C-terminal amino acid of each peptide located before the 60 th amino acid of the corresponding protein), of which 34.1% had complete ion series (at least one b or y ion was detected at each peptide bond), and 66.4% had at most two peptide linkages in which both the b and y ions were missing. These semi-tryptic peptides provide valuable clues for identifying signal peptides, and 694 of them were already verified in UniProt (Supplementary Data 4). The score distributions of these 9,559 peptides and the total 548,371 peptides (Fig. S12) indicated that although these semi-tryptic peptides were from a much larger search space (Supplementary Fig. 10), their confidence was still comparable to that of the total results. 23

24 Fig. S11. Profiling the Kim data using Open-pFind. a) The distribution of identification rates of each RAW file. Each boxplot denotes the distribution for each type of the experimental settings (brp_velos, brp_elite, Gel_Velos and Gel_Elite; N = 338, 775, 585, 514 for the number of raw files in the four boxplots shown from left to right, respectively). Box-plot elements: center line, median; box limits, first and third quartile (Q1 and Q3); whiskers, from Q1 1.5 IQR to Q3+1.5 IQR; grey dots, outlier points. b) The distribution of highly abundant modifications. Each number in one cell denotes the percentage of modified amino acids among all amino acids that appeared among the identified peptides. For example, 79.7% of cysteines were modified by carbamidomethylation in the identified peptides from an LTQ Orbitrap Velos MS fractionized by brplc. c) The distribution of the fraction of semi- and non-specific peptides under different experimental conditions. Each boxplot denotes the distribution for each type of the experimental settings (brp_velos, brp_elite, Gel_Velos and Gel_Elite; N = 338, 775, 585, 514 for the number of raw files in the four boxplots shown from left to right, respectively). Box-plot elements: center line, median; box limits, first and third quartile (Q1 and Q3); whiskers, from Q1 1.5 IQR to Q3+1.5 IQR. d) The distribution of peptide numbers identified from one spectrum. For example, 7.5% of the identified spectra from an LTQ Orbitrap Velos MS fractionized by brplc each contribute two peptides. e) The identified peptides in Laminin subunit gamma-1. Red numbers in the brackets denote how many PSMs correspond to each peptide. Fig. S12. The score distributions from the 9,559 semi-tryptic peptides and the scores of all 548,371 peptides identified in Kim data. 24

25 Supplementary References 1. Michalski, A. et al. Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol Cell Proteomics 10, M (2011). 2. Chick, J.M. et al. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat Biotechnol 33, (2015). 3. Sharma, K. et al. Cell type- and brain region-resolved mouse brain proteome. Nat Neurosci 18, (2015). 4. Kim, M.S. et al. A draft map of the human proteome. Nature 509, (2014). 5. Sechi, S. & Chait, B.T. Modification of cysteine residues by alkylation. A tool in peptide mapping and protein identification. Anal Chem 70, (1998). 6. Sherry, S.T., Ward, M. & Sirotkin, K. dbsnp-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 9, (1999). 25

Nature Biotechnology: doi: /nbt Supplementary Figure 1. The workflow of Open-pFind.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. The workflow of Open-pFind. Supplementary Figure 1 The workflow of Open-pFind. The MS data are first preprocessed by pparse, and then the MS/MS data are searched by the open search module. Next, the MS/MS data are re-searched by

More information

ProteinPilot Report for ProteinPilot Software

ProteinPilot Report for ProteinPilot Software ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Powerful mass spectrometers like

More information

Spectral Counting Approaches and PEAKS

Spectral Counting Approaches and PEAKS Spectral Counting Approaches and PEAKS INBRE Proteomics Workshop, April 5, 2017 Boris Zybailov Department of Biochemistry and Molecular Biology University of Arkansas for Medical Sciences 1. Introduction

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1

Nature Biotechnology: doi: /nbt Supplementary Figure 1 Supplementary Figure 1 The mass accuracy of fragment ions is important for peptide recovery in wide-tolerance searches. The same data as in Figure 1B was searched with varying fragment ion tolerances (FIT).

More information

Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs

Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs Technical Overview Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs Authors Stephen Madden, Crystal Cody, and Jungkap Park Agilent Technologies, Inc. Santa Clara, California,

More information

How to view Results with Scaffold. Proteomics Shared Resource

How to view Results with Scaffold. Proteomics Shared Resource How to view Results with Scaffold Proteomics Shared Resource Starting out Download Scaffold from http://www.proteomes oftware.com/proteom e_software_prod_sca ffold_download.html Follow installation instructions

More information

Supporting Information for

Supporting Information for Supporting Information for CharmeRT: Boosting peptide identifications by chimeric spectra identification and retention time prediction Viktoria Dorfer*,,a, Sergey Maltsev,b, Stephan Winkler a, Karl Mechtler*,b,c.

More information

Filter-based Protein Digestion (FPD): A Detergent-free and Scaffold-based Strategy for TMT workflows

Filter-based Protein Digestion (FPD): A Detergent-free and Scaffold-based Strategy for TMT workflows Supporting Information Filter-based Protein Digestion (FPD): A Detergent-free and Scaffold-based Strategy for TMT workflows Ekaterina Stepanova 1, Steven P. Gygi 1, *, Joao A. Paulo 1, * 1 Department of

More information

Basic protein and peptide science for proteomics. Henrik Johansson

Basic protein and peptide science for proteomics. Henrik Johansson Basic protein and peptide science for proteomics Henrik Johansson Proteins are the main actors in the cell Membranes Transport and storage Chemical factories DNA Building proteins Structure Proteins mediate

More information

Spectronaut Pulsar X. Maximize proteome coverage and data completeness by utilizing the power of Hybrid Libraries

Spectronaut Pulsar X. Maximize proteome coverage and data completeness by utilizing the power of Hybrid Libraries Spectronaut Pulsar X Maximize proteome coverage and data completeness by utilizing the power of Hybrid Libraries More versatility in proteomics research Spectronaut has delivered highest performance in

More information

How to view Results with. Proteomics Shared Resource

How to view Results with. Proteomics Shared Resource How to view Results with Scaffold 3.0 Proteomics Shared Resource An overview This document is intended to walk you through Scaffold version 3.0. This is an introductory guide that goes over the basics

More information

Modification Site Localization Scoring Integrated into a Search Engine

Modification Site Localization Scoring Integrated into a Search Engine Modification Site Localization Scoring Integrated into a Search Engine Peter R. Baker 1, Jonathan C. Trinidad 1, Katalin F. Medzihradszky 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry

More information

Supplementary information, Figure S1A ShHTL7 interacted with MAX2 but not another F-box protein COI1.

Supplementary information, Figure S1A ShHTL7 interacted with MAX2 but not another F-box protein COI1. GR24 (μm) 0 20 0 20 GST-ShHTL7 anti-gst His-MAX2 His-COI1 PVDF staining Supplementary information, Figure S1A ShHTL7 interacted with MAX2 but not another F-box protein COI1. Pull-down assays using GST-ShHTL7

More information

N- The rank of the specified protein relative to all other proteins in the list of detected proteins.

N- The rank of the specified protein relative to all other proteins in the list of detected proteins. PROTEIN SUMMARY file N- The rank of the specified protein relative to all other proteins in the list of detected proteins. Unused (ProtScore) - A measure of the protein confidence for a detected protein,

More information

Algorithm for Matching Additional Spectra

Algorithm for Matching Additional Spectra Improved Methods for Comprehensive Sample Analysis Using Protein Prospector Peter R. Baker 1, Katalin F. Medzihradszky 1 and Alma L. Burlingame 1 1 Mass Spectrometry Facility, Dept. of Pharmaceutical Chemistry,

More information

Identification of Microprotein-Protein Interactions via APEX Tagging

Identification of Microprotein-Protein Interactions via APEX Tagging Supporting Information Identification of Microprotein-Protein Interactions via APEX Tagging Qian Chu, Annie Rathore,, Jolene K. Diedrich,, Cynthia J. Donaldson, John R. Yates III, and Alan Saghatelian

More information

Confident Protein ID using Spectrum Mill Software

Confident Protein ID using Spectrum Mill Software Welcome to our E-Seminar: Confident Protein ID using Spectrum Mill Software Slide 1 Spectrum Mill Informatics Software Start with batches of raw MS data! Sp ec t ru m Mi ll Biologist-friendly answers!

More information

Improving Productivity with Applied Biosystems GPS Explorer

Improving Productivity with Applied Biosystems GPS Explorer Product Bulletin TOF MS Improving Productivity with Applied Biosystems GPS Explorer Software Purpose GPS Explorer Software is the application layer software for the Applied Biosystems 4700 Proteomics Discovery

More information

Proteomics and some of its Mass Spectrometric Applications

Proteomics and some of its Mass Spectrometric Applications Proteomics and some of its Mass Spectrometric Applications What? Large scale screening of proteins, their expression, modifications and interactions by using high-throughput approaches 2 1 Why? The number

More information

Supplemental Materials

Supplemental Materials Supplemental Materials MSGFDB Parameters For all MSGFDB searches, the following parameters were used: 30 ppm precursor mass tolerance, enable target-decoy search, enzyme specific scoring (for Arg-C, Asp-N,

More information

基于质谱的蛋白质药物定性定量分析技术及应用

基于质谱的蛋白质药物定性定量分析技术及应用 基于质谱的蛋白质药物定性定量分析技术及应用 蛋白质组学质谱数据分析 单抗全蛋白测序及鉴定 Bioinformatics Solutions Inc. Waterloo, ON, Canada 上海 中国 2017 1 3 PEAKS Studio GUI Introduction Overview of PEAKS Studio GUI Project View Tasks, running info,

More information

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP)

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP) Protein Reports CPTAC Common Data Analysis Pipeline (CDAP) v. 4/13/2015 Summary The purpose of this document is to describe the protein reports generated as part of the CPTAC Common Data Analysis Pipeline

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol 25 no 22 29, pages 2969 2974 doi:93/bioinformatics/btp5 Data and text mining Improving peptide identification with single-stage mass spectrum peaks Zengyou He and Weichuan

More information

Monoclonal Antibody Characterization on Q Exactive and Oribtrap Elite. Yi Zhang, Ph.D Senior Proteomic Marketing Specialist Oct.

Monoclonal Antibody Characterization on Q Exactive and Oribtrap Elite. Yi Zhang, Ph.D Senior Proteomic Marketing Specialist Oct. Monoclonal Antibody Characterization on Q Exactive and Oribtrap Elite Yi Zhang, Ph.D Senior Proteomic Marketing Specialist Oct. 12, 211 Outline Orbitrap Mass Spectrometer in mab Characterization Intact

More information

A New Strategy for Quantitative Proteomics Using Isotope-Coded Protein Labels

A New Strategy for Quantitative Proteomics Using Isotope-Coded Protein Labels ICPL TM -Isotope Coded Label A New Strategy for Quantitative Proteomics Using Isotope-Coded Labels Alexander Schmidt Department for Max-Planck-Institute of Biochemistry Spotfire User Conference 2004, Cologne

More information

Peptide and protein identification in mass spectrometry based proteomics. Yafeng Zhu, PhD student Karolinska Institutet, Scilifelab

Peptide and protein identification in mass spectrometry based proteomics. Yafeng Zhu, PhD student Karolinska Institutet, Scilifelab Peptide and protein identification in mass spectrometry based proteomics Yafeng Zhu, PhD student Karolinska Institutet, Scilifelab 2017-10-12 Content How is the peptide sequence identified? What is the

More information

RockerBox. Filtering massive Mascot search results at the.dat level

RockerBox. Filtering massive Mascot search results at the.dat level RockerBox Filtering massive Mascot search results at the.dat level Challenges Big experiments High amount of data Large raw and.dat files (> 2GB) How to handle our results?? The 2.2 peptide summary could

More information

Quantification of Isotope Encoded Proteins in 2D Gels

Quantification of Isotope Encoded Proteins in 2D Gels Quantification of Isotope Encoded Proteins in 2D Gels Using Surface Enhanced Resonance Raman Giselle M. Knudsen 1, Brandon M. Davis 2, Shirshendu K. Deb 1, Yvette Loethen 2, Ravindra Gudihal 1, Pradeep

More information

De novo sequencing in the identification of mass data. Wang Quanhui Liu Siqi Beijing Institute of Genomics, CAS

De novo sequencing in the identification of mass data. Wang Quanhui Liu Siqi Beijing Institute of Genomics, CAS De novo sequencing in the identification of mass data Wang Quanhui Liu Siqi Beijing Institute of Genomics, CAS The difficulties in mass data analysis Although the techniques of genomic sequencing are being

More information

FACTORS THAT AFFECT PROTEIN IDENTIFICATION BY MASS SPECTROMETRY HAOFEI TIFFANY WANG. (Under the Direction of Ron Orlando) ABSTRACT

FACTORS THAT AFFECT PROTEIN IDENTIFICATION BY MASS SPECTROMETRY HAOFEI TIFFANY WANG. (Under the Direction of Ron Orlando) ABSTRACT FACTORS THAT AFFECT PROTEIN IDENTIFICATION BY MASS SPECTROMETRY by HAOFEI TIFFANY WANG (Under the Direction of Ron Orlando) ABSTRACT Mass spectrometry combined with database search utilities is a valuable

More information

Liver Mitochondria Proteomics Employing High-Resolution MS Technology

Liver Mitochondria Proteomics Employing High-Resolution MS Technology Liver Mitochondria Proteomics Employing High-Resolution MS Technology Jenny Ho, 1 Loïc Dayon, 2 John Corthésy, 2 Umberto De Marchi, 2 Antonio Núñez, 2 Andreas Wiederkehr, 2 Rosa Viner, 3 Michael Blank,

More information

Supporting Information. Scanning Quadrupole Data Independent Acquisition Part A Qualitative and Quantitative Characterization

Supporting Information. Scanning Quadrupole Data Independent Acquisition Part A Qualitative and Quantitative Characterization S Supporting Information Scanning Quadrupole Data Independent Acquisition Part A Qualitative and Quantitative Characterization M. Arthur Moseley, Christopher J. Hughes 2, Praveen R. Juvvadi 3, Erik J.

More information

A highly sensitive and robust 150 µm column to enable high-throughput proteomics

A highly sensitive and robust 150 µm column to enable high-throughput proteomics APPLICATION NOTE 21744 Robust LC Separation Optimized MS Acquisition Comprehensive Data Informatics A highly sensitive and robust 15 µm column to enable high-throughput proteomics Authors Xin Zhang, 1

More information

Quantitative mass spec based proteomics

Quantitative mass spec based proteomics Quantitative mass spec based proteomics Tuula Nyman Institute of Biotechnology tuula.nyman@helsinki.fi THE PROTEOME The complete protein complement expressed by a genome or by a cell or a tissue type (M.

More information

Hongwei Xie, Martin Gilar, and John C. Gebler Waters Corporation, Milford, MA, U.S.A. INTRODUCTION EXPERIMENTAL

Hongwei Xie, Martin Gilar, and John C. Gebler Waters Corporation, Milford, MA, U.S.A. INTRODUCTION EXPERIMENTAL Analysis of Deamidation and Oxidation in MONOCLONAL Antibody Using Peptide Mapping with UPLC/MS E Hongwei Xie, Martin Gilar, and John C. Gebler Waters Corporation, Milford, MA, U.S.A. INTRODUCTION Monoclonal

More information

Practical Tips. : Practical Tips Matrix Science

Practical Tips. : Practical Tips Matrix Science Practical Tips : Practical Tips 2006 Matrix Science 1 Peak detection Especially critical for Peptide Mass Fingerprints A tryptic digest of an average protein (30 kda) should produce of the order of 50

More information

Ensure your Success with Agilent s Biopharma Workflows

Ensure your Success with Agilent s Biopharma Workflows Ensure your Success with Agilent s Biopharma Workflows Steve Madden Software Product Manager Agilent Technologies, Inc. June 5, 2018 Agenda Agilent BioPharma Workflow Platform for LC/MS Intact Proteins

More information

PEAKS 8 User Manual. PEAKS Team

PEAKS 8 User Manual. PEAKS Team PEAKS 8 User Manual PEAKS Team PEAKS 8 User Manual PEAKS Team Publication date 2016 Table of Contents 1. Overview... 1 1. How to Use This Manual... 1 2. What Is PEAKS?... 1 3. What Is New in PEAKS 8?...

More information

Cell Signaling Technology

Cell Signaling Technology Cell Signaling Technology PTMScan Direct: Multipathway v2.0 Proteomics Service Group January 14, 2013 PhosphoScan Deliverables Project Overview Methods PTMScan Direct: Multipathway V2.0 (Tables 1,2) Qualitative

More information

Combination of Isobaric Tagging Reagents and Cysteinyl Peptide Enrichment for In-Depth Quantification

Combination of Isobaric Tagging Reagents and Cysteinyl Peptide Enrichment for In-Depth Quantification Combination of Isobaric Tagging Reagents and Cysteinyl Peptide Enrichment for In-Depth Quantification Protein Expression Analysis using the TripleTOF 5600 System and itraq Reagents Vojtech Tambor 1, Christie

More information

ProMass HR Applications!

ProMass HR Applications! ProMass HR Applications! ProMass HR Features Ø ProMass HR includes features for high resolution data processing. Ø ProMass HR includes the standard ProMass deconvolution algorithm as well as the full Positive

More information

Supplementary Information

Supplementary Information Identifying sources of tick blood meals using unidentified tandem mass spectral libraries Özlem Önder 1, Wenguang Shao 2, Brian Kemps 1, Henry Lam 2,3,*, Dustin Brisson 1,* 1 Department of Biology, University

More information

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies Technical Overview Introduction The central dogma for biological information flow is expressed as a series of chemical conversions

More information

Exam MOL3007 Functional Genomics

Exam MOL3007 Functional Genomics Faculty of Medicine Department of Cancer Research and Molecular Medicine Exam MOL3007 Functional Genomics Thursday December 20 th 9.00-13.00 ECTS credits: 7.5 Number of pages (included front-page): 5 Supporting

More information

Center for Mass Spectrometry and Proteomics Phone (612) (612)

Center for Mass Spectrometry and Proteomics Phone (612) (612) Outline Database search types Peptide Mass Fingerprint (PMF) Precursor mass-based Sequence tag Results comparison across programs Manual inspection of results Terminology Mass tolerance MS/MS search FASTA

More information

Application Note TOF/MS

Application Note TOF/MS Application Note TOF/MS New Level of Confidence for Protein Identification: Results Dependent Analysis and Peptide Mass Fingerprinting Using the 4700 Proteomics Discovery System Purpose The Applied Biosystems

More information

ProteinPilot Software for Protein Identification and Expression Analysis

ProteinPilot Software for Protein Identification and Expression Analysis ProteinPilot Software for Protein Identification and Expression Analysis Providing expert results for non-experts and experts alike ProteinPilot Software Overview New ProteinPilot Software transforms protein

More information

Strategies for Quantitative Proteomics. Atelier "Protéomique Quantitative" La Grande Motte, France - June 26, 2007

Strategies for Quantitative Proteomics. Atelier Protéomique Quantitative La Grande Motte, France - June 26, 2007 Strategies for Quantitative Proteomics Atelier "Protéomique Quantitative", France - June 26, 2007 Bruno Domon, Ph.D. Institut of Molecular Systems Biology ETH Zurich Zürich, Switzerland OUTLINE Introduction

More information

Quantitative Analysis on the Public Protein Prospector Web Site. Introduction

Quantitative Analysis on the Public Protein Prospector Web Site. Introduction Quantitative Analysis on the Public Protein Prospector Web Site Peter R. Baker 1, Nicholas J. Agard 2, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry Facility, Dept. of Pharmaceutical

More information

The effect of simulated microgravity on the Brassica napus seedling proteome

The effect of simulated microgravity on the Brassica napus seedling proteome 10.1071/FP16378_AC CSIRO 2018 Supplementary Material: Functional Plant Biology, 2018, 45(4), 440 452. Supplementary Material The effect of simulated microgravity on the Brassica napus seedling proteome

More information

Spectrum Mill MS Proteomics Workbench. Comprehensive tools for MS proteomics

Spectrum Mill MS Proteomics Workbench. Comprehensive tools for MS proteomics Spectrum Mill MS Proteomics Workbench Comprehensive tools for MS proteomics Meeting the challenge of proteomics data analysis Mass spectrometry is a core technology for proteomics research, but large-scale

More information

Proteins. Patrick Boyce Biopharmaceutical Marketing Manager Waters Corporation 1

Proteins. Patrick Boyce Biopharmaceutical Marketing Manager Waters Corporation 1 Routine Characterization of mabs and Other Proteins Patrick Boyce Biopharmaceutical Marketing Manager Europe and India 2011 Waters Corporation 1 Agenda Why? What scientific challenges? Technology Example

More information

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion s s Key questions of proteomics What proteins are there? Bioinformatics 2 Lecture X roteomics How much is there of each of the proteins? - Absolute quantitation - Stoichiometry What (modification/splice)

More information

High Resolution Accurate Mass Peptide Quantitation on Thermo Scientific Q Exactive Mass Spectrometers. The world leader in serving science

High Resolution Accurate Mass Peptide Quantitation on Thermo Scientific Q Exactive Mass Spectrometers. The world leader in serving science High Resolution Accurate Mass Peptide Quantitation on Thermo Scientific Q Exactive Mass Spectrometers The world leader in serving science Goals Explore the capabilities of High Resolution Accurate Mass

More information

Supplementary Fig. 1. S-1. Supplementary Fig. 2. S-2. Supplementary Fig. 3. S-3. Supplementary Fig. 4. S-4. Supplementary Fig. 5.

Supplementary Fig. 1. S-1. Supplementary Fig. 2. S-2. Supplementary Fig. 3. S-3. Supplementary Fig. 4. S-4. Supplementary Fig. 5. Supplementary Information - An IonStar experimental strategy for MS1 ion current-based quantification using ultra-high-field Orbitrap: reproducible, in-depth and accurate protein measurement in larger

More information

ProteinPilot Software Overview

ProteinPilot Software Overview ProteinPilot Software Overview High Quality, In-Depth Protein Identification and Protein Expression Analysis Sean L. Seymour and Christie L. Hunter SCIEX, USA As mass spectrometers for quantitative proteomics

More information

Rapid Peptide Catabolite ID using the SCIEX Routine Biotransform Solution

Rapid Peptide Catabolite ID using the SCIEX Routine Biotransform Solution Rapid Peptide Catabolite ID using the SCIEX Routine Biotransform Solution Rapidly Identify Major Catabolites with the SCIEX X500R QTOF System and MetabolitePilot TM 2.0 Software Ian Moore and Jinal Patel

More information

Appendix. Table of contents

Appendix. Table of contents Appendix Table of contents 1. Appendix figures 2. Legends of Appendix figures 3. References Appendix Figure S1 -Tub STIL (41) DVFFYQADDEHYIPR (55) (64) AVLLDLEPR (72) 1 451aa (1071) YLNENQLSQLSVTR (1084)

More information

A Highly Accurate Mass Profiling Approach to Protein Biomarker Discovery Using HPLC-Chip/ MS-Enabled ESI-TOF MS

A Highly Accurate Mass Profiling Approach to Protein Biomarker Discovery Using HPLC-Chip/ MS-Enabled ESI-TOF MS Application Note PROTEOMICS METABOLOMICS GENOMICS INFORMATICS GLYILEVALCYSGLUGLNALASERLEUASPARG CYSVALLYSPROLYSPHETYRTHRLEUHISLYS A Highly Accurate Mass Profiling Approach to Protein Biomarker Discovery

More information

Využití cílené proteomiky pro kontrolu falšování potravin: identifikace peptidových markerů v mase pomocí LC- Q Exactive MS/MS

Využití cílené proteomiky pro kontrolu falšování potravin: identifikace peptidových markerů v mase pomocí LC- Q Exactive MS/MS Využití cílené proteomiky pro kontrolu falšování potravin: identifikace peptidových markerů v mase pomocí LC- Q Exactive MS/MS Michal Godula Ph.D. Thermo Fisher Scientific The world leader in serving science

More information

Workflows and Pipelines for NGS analysis: Lessons from proteomics

Workflows and Pipelines for NGS analysis: Lessons from proteomics Workflows and Pipelines for NGS analysis: Lessons from proteomics Conference on Applying NGS in Basic research Health care and Agriculture 11 th Sep 2014 Debasis Dash Where are the protein coding genes

More information

Mass Spectrometry Based Proteomics Data Analysis Using GalaxyP

Mass Spectrometry Based Proteomics Data Analysis Using GalaxyP Mass Spectrometry Based Proteomics Data Analysis Using GalaxyP GCC 2015 GalaxyP Workshop July 6th, 2015 Norwich, UK Presenters: Tim Griffin, Pratik Jagtap and James Johnson Documentation: Kevin Murray,

More information

Protein Valida-on (Sta-s-cal Inference) and Protein Quan-fica-on. Center for Mass Spectrometry and Proteomics Phone (612) (612)

Protein Valida-on (Sta-s-cal Inference) and Protein Quan-fica-on. Center for Mass Spectrometry and Proteomics Phone (612) (612) Protein Valida-on (Sta-s-cal Inference) and Protein Quan-fica-on Terminology Pep-de Spectrum Match Target / Decoy False discovery rate Shared pep-de Parsimony One hit wonders SPECTRUM Rela-ve Abundance

More information

About OMICS Group Conferences

About OMICS Group Conferences About OMICS Group OMICS Group International is an amalgamation of Open Access publications and worldwide international science conferences and events. Established in the year 2007 with the sole aim of

More information

Comparability Analysis of Protein Therapeutics by Bottom-Up LC-MS with Stable Isotope-Tagged Reference Standards

Comparability Analysis of Protein Therapeutics by Bottom-Up LC-MS with Stable Isotope-Tagged Reference Standards Comparability Analysis of Protein Therapeutics by Bottom-Up LC-MS with Stable Isotope-Tagged Reference Standards 16 September 2011 Abbott Bioresearch Center, Worcester, MA USA Manuilov, A. V., C. H. Radziejewski

More information

Objective. Introduction. IP assisted LC/MS/MS making study protein complexes easy. Jon Hao 1, Yi Liu 1, Xiaozhi Ren 2, and King-Wai Yau 2

Objective. Introduction. IP assisted LC/MS/MS making study protein complexes easy. Jon Hao 1, Yi Liu 1, Xiaozhi Ren 2, and King-Wai Yau 2 IP assisted LC/MS/MS making study protein complexes easy Jon Hao 1, Yi Liu 1, Xiaozhi Ren 2, and King-Wai Yau 2 1 Poochon Scientific, Frederick, Maryland 21701 2 Department of Neuroscience, Johns Hopkins

More information

Received: August 5, 2016 Published: December 26, Article. pubs.acs.org/jpr

Received: August 5, 2016 Published: December 26, Article. pubs.acs.org/jpr pkb00 ACSJCA JCA10.0.1465/W Unicode research.3f (R3.6.i12 HF01:4457 2.0 alpha 39) 2016/10/28 09:46:00 PROD-JCAVA rq_7724871 1/05/2017 15:53:13 10 JCA-DEFAULT pubs.acs.org/jpr 1 Open-pNovo: De Novo Peptide

More information

Fast and Efficient Peptide Mapping of a Monoclonal Antibody (mab): UHPLC Performance with Superficially Porous Particles

Fast and Efficient Peptide Mapping of a Monoclonal Antibody (mab): UHPLC Performance with Superficially Porous Particles Fast and Efficient Peptide Mapping of a Monoclonal Antibody (mab): UHPLC Performance with Superficially Porous Particles Application Note Biotherapeutics and Biosimilars Authors James Martosella, Alex

More information

High-throughput Proteomic Data Analysis. Suh-Yuen Liang ( 梁素雲 ) NRPGM Core Facilities for Proteomics and Glycomics Academia Sinica Dec.

High-throughput Proteomic Data Analysis. Suh-Yuen Liang ( 梁素雲 ) NRPGM Core Facilities for Proteomics and Glycomics Academia Sinica Dec. High-throughput Proteomic Data Analysis Suh-Yuen Liang ( 梁素雲 ) NRPGM Core Facilities for Proteomics and Glycomics Academia Sinica Dec. 9, 2009 High-throughput Proteomic Data Are Information Rich and Dependent

More information

Pushing the Leading Edge in Protein Quantitation: Integrated, Precise, and Reproducible Protein Quantitation Workflow Solutions

Pushing the Leading Edge in Protein Quantitation: Integrated, Precise, and Reproducible Protein Quantitation Workflow Solutions 2017 Metabolomics Seminars Pushing the Leading Edge in Protein Quantitation: Integrated, Precise, and Reproducible Protein Quantitation Workflow Solutions The world leader in serving science 2 3 Cancer

More information

Detecting Challenging Post Translational Modifications (PTMs) using CESI-MS

Detecting Challenging Post Translational Modifications (PTMs) using CESI-MS Detecting Challenging Post Translational Modifications (PTMs) using CESI-MS Sensitive workflow for the detection of PTMs in biological samples from small sample volumes Klaus Faserl, 1 Bettina Sarg, 1

More information

Advanced QA/QC characterization MS in QC : Multi Attribute Method

Advanced QA/QC characterization MS in QC : Multi Attribute Method Advanced QA/QC characterization MS in QC : Multi Attribute Method Global BioPharma Summit The world leader in serving science A Complex Problem: Drug Safety and Quality Safety Is the product safe to use?

More information

Introduction. Benefits of the SWATH Acquisition Workflow for Metabolomics Applications

Introduction. Benefits of the SWATH Acquisition Workflow for Metabolomics Applications SWATH Acquisition Improves Metabolite Coverage over Traditional Data Dependent Techniques for Untargeted Metabolomics A Data Independent Acquisition Technique Employed on the TripleTOF 6600 System Zuzana

More information

MBios 478: Mass Spectrometry Applications [Dr. Wyrick] Slide #1. Lecture 25: Mass Spectrometry Applications

MBios 478: Mass Spectrometry Applications [Dr. Wyrick] Slide #1. Lecture 25: Mass Spectrometry Applications MBios 478: Mass Spectrometry Applications [Dr. Wyrick] Slide #1 Lecture 25: Mass Spectrometry Applications Measuring Protein Abundance o ICAT o DIGE Identifying Post-Translational Modifications Protein-protein

More information

Genomics, Transcriptomics and Proteomics

Genomics, Transcriptomics and Proteomics Genomics, Transcriptomics and Proteomics GENES ARNm PROTEINS Genomics Transcriptomics Proteomics (10 6 protein forms?) Maturation PTMs Partners Localisation EDyP Lab : «Exploring the Dynamics of Proteomes»

More information

Protein Grouping, FDR Analysis and Databases.

Protein Grouping, FDR Analysis and Databases. Protein Grouping, FDR Analysis and Databases. March 15th 2012 Pratik Jagtap The Minnesota http://www.mass.msi.umn.edu/ Protein Grouping, FDR Analysis and Databases Overview. Protein Grouping : Concept

More information

PROTEOINFORMATICS OVERVIEW

PROTEOINFORMATICS OVERVIEW PROTEOINFORMATICS OVERVIEW August 11th 2016 Pratik Jagtap Center for Mass Spectrometry and Proteomics http://www.cbs.umn.edu/msp Outline PROTEOMICS WORKFLOW PEAKLIST PROCESSING Search Databases Overview

More information

New Approaches to Quantitative Proteomics Analysis

New Approaches to Quantitative Proteomics Analysis New Approaches to Quantitative Proteomics Analysis Chris Hodgkins, Market Development Manager, SCIEX ANZ 2 nd November, 2017 Who is SCIEX? Founded by Dr. Barry French & others: University of Toronto Introduced

More information

Thermo Scientific Peptide Mapping Workflows. Upgrade Your Maps. Fast, confident and more reliable peptide mapping.

Thermo Scientific Peptide Mapping Workflows. Upgrade Your Maps. Fast, confident and more reliable peptide mapping. Thermo Scientific Peptide Mapping Workflows Upgrade Your Maps Fast, confident and more reliable peptide mapping. Smarter Navigation... Peptide mapping is a core analytic in biotherapeutic development.

More information

Supplementary Results Supplementary Table 1. P1 and P2 enrichment scores for wild-type subtiligase.

Supplementary Results Supplementary Table 1. P1 and P2 enrichment scores for wild-type subtiligase. Supplementary Results Supplementary Table 1. P1 and P2 enrichment scores for wild-type subtiligase. Supplementary Table 2. Masses of subtiligase mutants measured by LC-MS. Supplementary Table 2 (cont d).

More information

ADVANCING ATTRIBUTE CONTROL OF ANTIBODIES AND ITS DERIVATIVES USING HIGH RESOLUTION ANALYTICS

ADVANCING ATTRIBUTE CONTROL OF ANTIBODIES AND ITS DERIVATIVES USING HIGH RESOLUTION ANALYTICS ADVANCING ATTRIBUTE CONTROL OF ANTIBODIES AND ITS DERIVATIVES USING HIGH RESOLUTION ANALYTICS Henry Shion 1, Jing Fang 1, William Alley 1, Barbara Sullivan 2, Nick Tomczyk 3, Liuxi Chen 1, Ying-Qing Yu

More information

timstof Innovation with Integrity Powered by PASEF TIMS-QTOF MS

timstof Innovation with Integrity Powered by PASEF TIMS-QTOF MS timstof Powered by PASEF Innovation with Integrity TIMS-QTOF MS timstof The new standard for high speed, high sensitivity shotgun proteomics The timstof Pro with PASEF technology delivers revolutionary

More information

Top-Down Proteomics Enables Comparative Analysis of Brain. Proteoforms Between Mouse Strains

Top-Down Proteomics Enables Comparative Analysis of Brain. Proteoforms Between Mouse Strains Top-Down Proteomics Enables Comparative Analysis of Brain Proteoforms Between Mouse Strains Roderick G. Davis 1, Hae-Min Park 1, Kyunggon Kim #1, Joseph B. Greer 1, Ryan T. Fellers 1, Richard D. LeDuc

More information

ProteomicsBrowser User Guide

ProteomicsBrowser User Guide Vol.1 ProteomicsBrowser User Guide Copyrighted by Yale University Table of Contents PROTEOMICSBROWSER DOWNLOADING AND INSTALLATION...3 PROJECT MANAGEMENT...3 CREATE PROJECT... 3 SAVE PROJECT... 3 OPEN

More information

Shotgun Proteomics: How Confident are you in that Identification? or Statistical Evaluation of Shotgun Proteomic Data

Shotgun Proteomics: How Confident are you in that Identification? or Statistical Evaluation of Shotgun Proteomic Data Shotgun Proteomics: How Confident are you in that Identification? or Statistical Evaluation of Shotgun Proteomic Data Ron Orlando Complex Carbohydrate Research Center University of Georgia Athens, GA 30602

More information

iprg-2016 Proteome Informatics Research Group Study: Inferring Proteoforms from Bottom-up Proteomics Data

iprg-2016 Proteome Informatics Research Group Study: Inferring Proteoforms from Bottom-up Proteomics Data iprg-2016 Proteome Informatics Research Group Study: Inferring Proteoforms from Bottom-up Proteomics Data The ABRF iprg Research Group October 10, 2016 Dear iprg 2016 Study Participant, Thank you for participating

More information

Host Cell Protein Analysis Using Agilent AssayMAP Bravo and 6545XT AdvanceBio LC/Q-TOF

Host Cell Protein Analysis Using Agilent AssayMAP Bravo and 6545XT AdvanceBio LC/Q-TOF Application Note Biotherapeutics and Biosimilars Host Cell Protein Analysis Using Agilent AssayMAP Bravo and 6545XT AdvanceBio LC/Q-TOF Authors Linfeng Wu, Shuai Wu and Te-Wei Chu Agilent Technologies,

More information

Proteomics software at MSI. Pratik Jagtap Minnesota Supercomputing institute

Proteomics software at MSI. Pratik Jagtap Minnesota Supercomputing institute Proteomics software at MSI. Pratik Jagtap Minnesota Supercomputing institute http://www.mass.msi.umn.edu/ Proteomics software at MSI. proteomics : emerging technology proteomics workflow search algorithms

More information

Important Information for MCP Authors

Important Information for MCP Authors Guidelines to Authors for Publication of Manuscripts Describing Development and Application of Targeted Mass Spectrometry Measurements of Peptides and Proteins and Submission Checklist The following Guidelines

More information

Enabling Systems Biology Driven Proteome Wide Quantitation of Mycobacterium Tuberculosis

Enabling Systems Biology Driven Proteome Wide Quantitation of Mycobacterium Tuberculosis Enabling Systems Biology Driven Proteome Wide Quantitation of Mycobacterium Tuberculosis SWATH Acquisition on the TripleTOF 5600+ System Samuel L. Bader, Robert L. Moritz Institute of Systems Biology,

More information

Data Pre-processing in Liquid Chromatography-Mass Spectrometry Based Proteomics

Data Pre-processing in Liquid Chromatography-Mass Spectrometry Based Proteomics Bioinformatics Advance Access published September 8, 25 The Author (25). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

More information

Data Quality Control in Peptide Identification

Data Quality Control in Peptide Identification Data Quality Control in Peptide Identification Yunping Zhu State Key Laboratory of Proteomics Beijing Proteome Research Center Beijing Institute of Radiation Medicine Beijing, 2010-11-11 Data quality control

More information

Precision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and

Precision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and Mol Cell Proteomics Papers in Press. Published on January 8, 2019 as Manuscript TIR118.000918 Precision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and trypsin for large-scale proteomics

More information

Progenesis QI for proteomics HCP Spectral Library User Guide

Progenesis QI for proteomics HCP Spectral Library User Guide Progenesis QI for proteomics HCP Spectral Library User Guide Analysis workflow guidelines for MS E data Contents Introduction... 3 How to use this document... 3 How can I analyse my own runs using Progenesis

More information

for water and beverage analysis

for water and beverage analysis Thermo Scientific EQuan MAX Plus Systems Automated, high-throughput LC-MS solutions for water and beverage analysis Pesticides Pharmaceuticals Personal care products Endocrine disruptors Perfluorinated

More information

Strategies in proteomics

Strategies in proteomics Strategies in proteomics Systems biology - understand cellpathways, network, and complex interacting (includes Genomics, Proteomics, Metabolomics..) Biological processes - characterize protein complexes,

More information

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure

More information

Supplementary Figure 1: MS Data Quality (1/2)

Supplementary Figure 1: MS Data Quality (1/2) Supplementary Figure 1: MS Data Quality (1/2) a.4.3 Precursor Mass Error Plot Median Absolute Mass Error:.72 ppm b.2.15 Fragment Mass Error Plot Median Absolute Mass Error: 1.95 ppm.2.1.1.5. 6 4 2 2 4

More information

ipep User s Guide Proteomics

ipep User s Guide Proteomics Proteomics ipep User s Guide ipep has been designed to aid in designing sample preparation techniques around the detection of specific sites or motifs in proteins and proteomes. The first application,

More information

MIAPE: Mass Spectrometry Informatics

MIAPE: Mass Spectrometry Informatics MIAPE: Mass Spectrometry Informatics Pierre-Alain Binz[1,2]*, Robert Barkovich[3], Ronald C. Beavis[4], David Creasy[5], David M. Horn[6], Randall K. Julian Jr.[7], Sean L. Seymour[8], Chris F. Taylor[9],

More information