Proteomics Technology Note

Similar documents
Improving Productivity with Applied Biosystems GPS Explorer

ProteinPilot Software for Protein Identification and Expression Analysis

ProteinPilot Report for ProteinPilot Software

New workflows for protein expression analysis ICAT. Reagent Technology

A Highly Accurate Mass Profiling Approach to Protein Biomarker Discovery Using HPLC-Chip/ MS-Enabled ESI-TOF MS

Enabling Systems Biology Driven Proteome Wide Quantitation of Mycobacterium Tuberculosis

Proteomics and some of its Mass Spectrometric Applications

Two-Dimensional LC Protein Separation on Monolithic Columns in a Fully Automated Workflow

Solutions Guide. MX Series II Modular Automation for Nano and Analytical Scale HPLC And Low Pressure Fluid Switching Applications

Accelerating Throughput for Targeted Quantitation of Proteins/Peptides in Biological Samples

Protein preparation and digestion. P. aeruginosa strains were grown in in AB minimal medium

Combination of Isobaric Tagging Reagents and Cysteinyl Peptide Enrichment for In-Depth Quantification

Hichrom Limited 1 The Markham Centre, Station Road, Theale, Reading, Berkshire, RG7 4PE, UK Tel: +44 (0) Fax: +44 (0)

Application Note TOF/MS

Basic protein and peptide science for proteomics. Henrik Johansson

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

SGN-6106 Computational Systems Biology I

ProteinPilot Software Overview

Algorithm for Matching Additional Spectra

Confident Protein ID using Spectrum Mill Software

Lecture 8: Affinity Chromatography-III

Investigation of a Mammalian Cellular Model for Differential Protein Expression Analysis Using 1D PAGE and Cleavable ICAT Reagents

Computing with large data sets

Tools and considerations to increase resolution of complex proteome samples by two-dimensional offline LC/MS. Application

Application Note # ET-20 BioPharma Compass: A fully Automated Solution for Characterization and QC of Intact and Digested Proteins

Využití cílené proteomiky pro kontrolu falšování potravin: identifikace peptidových markerů v mase pomocí LC- Q Exactive MS/MS

Mass Spectrometry and Proteomics - Lecture 6 - Matthias Trost Newcastle University

Proteomics. Proteomics is the study of all proteins within organism. Challenges

High-throughput Proteomic Data Analysis. Suh-Yuen Liang ( 梁素雲 ) NRPGM Core Facilities for Proteomics and Glycomics Academia Sinica Dec.

World-leading LC/MS/MS technology for quantitation and identification API LC/MS/MS System

Objective. Introduction. IP assisted LC/MS/MS making study protein complexes easy. Jon Hao 1, Yi Liu 1, Xiaozhi Ren 2, and King-Wai Yau 2

Strategies for Quantitative Proteomics. Atelier "Protéomique Quantitative" La Grande Motte, France - June 26, 2007

Benefits of 2D-LC/MS/MS in Pharmaceutical Bioanalytics

CCRD Proteomics facility RIH-Brown University

Rapid Peptide Mapping via Automated Integration of On-line Digestion, Separation and Mass Spectrometry for the Analysis of Therapeutic Proteins

Automated platform of µlc-ms/ms using SAX trap column for

Supplemental materials and methods. The itraq labelling of the proteins was done as described previously [1]. Briefly, C2C12 myotubes

Investigating IκB Kinase Inhibition in Breast Cancer Cells

Proteomics: A Challenge for Technology and Information Science. What is proteomics?

Host Cell Protein Analysis Using Agilent AssayMAP Bravo and 6545XT AdvanceBio LC/Q-TOF

Appendix. Table of contents

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Peptide enrichment and fractionation

Ensure your Success with Agilent s Biopharma Workflows

N- The rank of the specified protein relative to all other proteins in the list of detected proteins.

Computer Software Virtual Protein Purification: A Simple Exercise to Introduce ph as a Parameter that Effects Ion Exchange Chromatography ws

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Overview. Tools for Protein Sample Preparation, 2-D Electrophoresis, and Imaging and Analysis

APPLICATION NOTE. Library. ProteinPilot RESULTS. Spectronaut

The Agilent Metabolomics Dynamic MRM Database and Method

Mass Spectrometry Analysis of Liquid Chromatography Fractions using Ettan LC MS System

Agilent AssayMAP Bravo Platform AUTOMATED PROTEIN AND PEPTIDE SAMPLE PREPARATION FOR MASS SPEC ANALYSIS

How to view Results with Scaffold. Proteomics Shared Resource

Faster, easier, flexible proteomics solutions

for water and beverage analysis

Preparative Protein Chemistry

Highly Confident Peptide Mapping of Protein Digests Using Agilent LC/Q TOFs

Quantitative mass spec based proteomics

Development and evaluation of Nano-ESI coupled to a triple quadrupole mass spectrometer for quantitative proteomics research

timstof Innovation with Integrity Powered by PASEF TIMS-QTOF MS

NanoLC-Ultra system. data sheet. Introducing the NanoLC-Ultra family of high pressure HPLCs for proteomics research

ENCODE DCC Antibody Validation Document

Innovations for Protein Research. Protein Research. Powerful workflows built on solid science

Kinetics Review. Tonight at 7 PM Phys 204 We will do two problems on the board (additional ones than in the problem sets)

Separose 4 Fast Flow beads stored in 100% isopropanol (Amersham. Bioscience) were washed four times with 1 mm HCl and twice with coupling buffer

Targeted Phosphoproteomics Analysis of Immunoaffinity Enriched Tyrosine Phosphorylation in Mouse Tissue

A highly sensitive and robust 150 µm column to enable high-throughput proteomics

Guide to Trap Cartridge Care and Use

What are proteomics? And what can they tell us about seed maturation and germination?

On-line SPE-LC/MS/MS to Detect Organonitrogen and Triazine Pesticides at 10ng/L in Drinking Water

Featuring Analyst software under Windows NT for enhanced performance and ease of use. API 150EX. LC/MS System

Application Note. Authors. Abstract. Ravindra Gudihal Agilent Technologies India Pvt. Ltd. Bangalore, India

Promix TM. Enter a New Era in Biomolecule Analysis with. Columns. Unsurpassed Selectivity and Peak Capacity for Peptides and Proteins

Advances in analytical biochemistry and systems biology: Proteomics

Cell Signaling Technology

Application Note. Authors. Abstract. Ravindra Gudihal Agilent Technologies India Pvt. Ltd. Bangalore India

Strategies in proteomics

Supplementary Figure 1. Processing and quality control for recombinant proteins.

Multi-Omics Analysis of Human Embryonic Stem Cell Neural Differentiation

Nanoflow LC Q-TOF MS for De Novo Peptide Sequencing in Microbial Proteomics

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

Supporting information

Introduction to Proteomics

ENCODE DCC Antibody Validation Document

columns PepSwift and ProSwift Capillary Monolithic Reversed-Phase Columns

Application Note. Author. Abstract. Biopharmaceuticals. Verified for Agilent 1260 Infinity II LC Bio-inert System. Sonja Schneider

Clone Selection Using the Agilent 1290 Infinity Online 2D-LC/MS Solution

Investigating Biological Variation of Liver Enzymes in Human Hepatocytes

New Approaches to Quantitative Proteomics Analysis

Liver Mitochondria Proteomics Employing High-Resolution MS Technology

Sean M. McCarthy and Martin Gilar Waters Corporation, Milford, MA, U.S. INTRODUCTION EXPERIMENTAL RESULTS AND DISCUSSION

Introduction. Benefits of the SWATH Acquisition Workflow for Metabolomics Applications

Isotopic Resolution of Chromatographically Separated IdeS Subunits Using the X500B QTOF System

LC/MS/MS Solutions for Biomarker Discovery QSTAR. Elite Hybrid LC/MS/MS System. More performance, more reliability, more answers

Chapter 6. Techniques of Protein and Nucleic Acid Purification

SUPPLEMENTARY INFORMATION

Size Exclusion BioHPLC columns Ion Exchange BioHPLC columns

Application Note. Abstract. Author. Biopharmaceutical

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

Transcription:

Targeted Proteomics for the Identification of Nucleic Acid Binding Proteins in E. coli: Finding more low abundant proteins using the QSTAR LC/MS/MS System, Multi- Dimensional Liquid Chromatography, Pro Automate Software, and the Celera Discovery System TM Christie Hunter, Lydia Nuwaysir Applied Biosystems, CA, USA Introduction Two-dimensional gel electrophoresis coupled with mass spectrometry is regarded as a powerful tool in the separation and identification of complex protein samples. Despite its high resolving power, the technique has limitations for the separation and identification of membrane and low abundance proteins. Multidimensional liquid chromatography (MDLC) coupled to mass spectrometry provides an alternative to this approach, and allows access to these proteins. In the most popular form of this technique, protein mixtures are enzymatically digested and the resulting peptides are loaded onto a cation exchange chromatography column for fractionation either off-line or on-line via a salt gradient. These fractions are then further separated by reverse phase LC coupled to a mass spectrometer. To ensure good coverage of the proteome, it is becoming increasingly apparent that a more targeted approach to proteomics is advantageous. In addition, the ability to acquire data in a more results driven manner is essential. In this work, we use a protein function-based sample simplification step, followed by multiple rounds of MDLC using Specific Mass and Retention Time (SMART) exclusion lists generated from protein identification results to increase protein coverage and dig deeper into the proteome. Figure 1. Fatty acid responsive transcription factor (FADR) - DNA complex: transcriptional control of fatty acid metabolism in E. coli. MDLC in combination with a QSTAR Pulsar quadrupole-time of flight mass spectrometer was used to investigate a subset of E. coli proteins, the nucleic acid binding proteins. These proteins were isolated from an E. coli lysate using a DNA affinity column. Multiple MDLC runs were performed using sequential SMART exclusion lists created from protein identifications in the previous runs. To assess the relative merit of this approach for identification of low abundance proteins, codon adaptation index (1) values were calculated on the protein results after each analysis. Finally, the molecular functions and biological processes of the identified proteins were investigated using the PANTHER Protein Function-Family Browser (2) in the Celera Discovery System (3) to gain insight into the quality of the experimental protocol and provide valuable information about the identified proteins. 5 mm Experimental 25 mm Sample Preparation: The nucleic acid binding proteins were purified from an E. coli cell lysate using a DNA cellulose column (Sigma). The proteins were sequentially eluted using two salt concentrations: 0.4 M NaCl (low salt elution weak binding fraction) and 1 M NaCl (high salt elution strong binding fraction). Each fraction was then desalted, reduced and alkylated with iodoacetamide, and digested with trypsin. 100 mm 250 mm Figure 2. NanoLC analysis of cation exchange fractions.

Chromatography: MDLC (Multi-dimensional liquid chromatography) was performed using the LC Packings integrated system (Dionex) consisting of a FAMOS micro autosampler, Switchos micro column switching module, and UltiMate micro pump. Each fraction was first loaded onto a Bio-SCX cation exchange trap cartridge (0.5 x 15 mm) then eluted stepwise onto a PepMap TM C18 trap cartridge (0.3 x 5 mm). Salt steps (200 ml each) used were 0, 5, 10, 25, 50, 100, 250 and 1000 mm ammonium acetate in 0.1% formic acid. Finally, peptides were eluted off the reverse phase trap cartridge onto the PepMap TM C18 analytical column (0.075 x 150 mm) using a linear gradient of 5-35% acetonitrile in 0.1% formic acid. Mass Spectrometry: All LC/MS/MS data were automatically acquired using Information Dependent Acquisition (IDA) on the QSTAR PULSAR LC/MS/MS System. Pro Automate Software was used to automate the acquisition of the MDLC data, the database searching of the MS/MS data, and the generation of the time-filtered exclusion lists for subsequent MDLC runs. Data Processing: All MS/MS spectra were automatically submitted for database searching using Pro ID Software, which identifies proteins from MS/MS spectra using the Interrogator Database Search algorithm. Either an E. coli specific subset of the NCBI database or the E. coli CDS FASTA file was used for database searching. Codon Adaptation Index calculations were performed in house using the method of Sharp and Li (1). To gain additional information on proteins of interest, the Celera Discovery System was accessed. Results Using Time Filtered Exclusion Lists To Dig Deeper Into Your Sample SMART (Specific Mass And Retention Time) exclusion filters were used to allow data acquisition from as many unique peptides as possible. SMART exclusion lists were automatically generated from protein identification results using Pro Automate Software (Figure 3). For these experiments, a complete set of MDLC MS/MS runs was performed and subjected to database searching. Exclusion lists were generated from the high confidence peptide matches (> 80%) and thus were specifically excluded from being sent for MS/MS in subsequent MDLC runs. For the weak nucleic acid binding fraction (low salt elution), a third MDLC run with an exclusion list was acquired to insure the majority of the detectable peptides were identified. Figure 3. Automatic generation of time-filtered exclusion lists in the Pro Automate software. High Salt Elution (strong binding) Low Salt Elution (weak binding) Using the Protein Score (ProtScore) from Pro Automate (calculated from the peptide evidence for each protein), the data were filtered at the protein level. At a very conservative protein confidence threshold of 95%, 295 unique proteins were identified to be present in the two DNA cellulose column fractions. Of these proteins, 69 were found to be present in both the low salt and high salt elution fractions with the majority eluting in the low salt fraction (Figure 4). 90 69 136 Figure 4. Proteins identified in each nucleic acid binding protein fraction.

In the first MDLC run from the weak binding fraction (low salt elution), 113 proteins corresponding to 537 unique peptides were identified with high confidence (> 95%). Using SMART exclusion lists, the second MDLC run was performed on the same sample and an additional 66 proteins were identified (1457 unique peptides in total). A third MDLC run was then performed using exclusion lists built from the first two MDLC results, yielding an additional 24 unique proteins and 1879 unique peptides in total (Figure 5). 203 proteins in total were found in the weak binding fraction. For the strong binding fraction, a similar trend was observed. The first MDLC run (high salt elution) yielded 129 proteins corresponding to 856 unique peptides with high confidence (> 95%). Using specific exclusion lists, the second MDLC run was performed on the same sample and an additional 30 proteins were identified (1493 unique peptides in total). In this work, a ~20% increase in protein coverage was obtained for each subsequent MDLC run where peptide based exclusion lists were applied. Thus, using multiple rounds of MDLC and applying time-based and peptide-based exclusion lists enabled the identification of more proteins overall and the improvement of sequence coverage for many of these proteins. Identifying More of the Low Abundant Proteins # Unique Sequences 2000 1500 1000 500 0 Run 1 Run 1,2 Run 1,2,3 Injection Injection Figure 5. Number of unique peptide sequences (blue) and number of total proteins (red) found in each of the sequential MDLC MS/MS runs from the weak nucleic acid binding protein fraction. # Proteins 250 200 150 10 0 50 0 R un 1 R un 1,2 R un 1,2,3 Figure 6. Effects of multiple rounds of time-based exclusion on the proportion of low abundance proteins found in each run of the weak nucleic acid binding fraction. Because of the degeneracy of the genetic code, most amino acid residues can be encoded by more than one codon. In genomes, certain codons will be favored by genes despite the availability of other codons that encode for the same residue. This tendency of a gene to use specific codons to encode for amino acids is called codon bias. Codon Adaptation Index (CAI) is a measure of codon bias. It uses a reference set of highly expressed genes against which the codons from all other genes are compared (1). In E. coli, lower CAI values are thought to correlate well with proteins that are less abundant. In the third MDLC run from the weak binding fraction, a higher proportion of proteins with lower CAI values are observed (Figure 6, yellow bars), indicating that multiple rounds of acquisition using cumulative exclusion lists is a powerful and effective strategy to enable MS/MS to be obtained from peptides originating from lower abundance proteins.

Linking Protein Identification With Biology Using The Celera Discovery System TM The proprietary PANTHER Protein Classification System (3) organizes proteins into families and subfamilies based upon global sequence similarity, common molecular functions, and participation in common biological processes. Processing of the MS/MS spectra against the E. coli CDS FASTA database using the Pro ID Software allows the gene ontology information to be visualized along with the protein identification results. Biological process and molecular function information can be used to quickly find proteins based upon similar biological attributes. For example, a large proportion of the proteins identified in this study were classified as nucleic acid binding proteins by their molecular functions. Additionally, other interesting protein classes can be rapidly identified, such as transcription factors (Figure 7). Figure 7. Pro ID Software Protein Summary results sorted by molecular function. Inset: PANTHER gene ontology information from CDS showing the protein family and subfamily, the biological process and the molecular function (small box). Figure 8. PANTHER molecular functions represented for the proteins identified from all DNA affinity column fractions. Figure 8 displays a pie chart of the different molecular functions represented by the proteins identified in this sample. As shown, a significant fraction of the proteins greater than 1/3 are classified as nucleic acid binding proteins, indicating the degree to which the sample preparation strategy was successful. A small percentage of the proteins are also classified as transcription factors a further indication that the data acquisition strategy was successful for identifying what are typically considered to be low abundance proteins.

Figure 9. MS/MS spectrum of peptide with sequence YLTEQGFQVR from Outer Membrane Protein R (OmpR). Understanding The Proteins Identified OmpR, an osmoregulatory DNA-binding protein is normally expressed in low abundance and has a CAI value of 0.268. In this study, OmpR was identified with 9 unique peptides in the weak binding fraction, resulting in a 41% sequence coverage for this 27 kda protein. Interestingly, 4 out of the 9 peptides were identified in the third MDLC run, further supporting the claim that cumulative exclusion lists enable detection of peptides from lower abundant proteins as well as increasing overall protein coverage. The PANTHER classifications for OmpR indicate that this protein is a transcription factor (molecular function) and is involved in mrna transcription (biological process). The Tree Viewer (Figure 10) displays the relationship between the different sequences within a family. The longer the horizontal branch length, the more distant the groups joined by those branches. The OmpR protein belongs to SF20 Transcriptional Regulatory Protein OmpR- Related at the bottom of the tree and is thus more distantly related to many other sequences in the tree relative to sequences in some other subfamilies. OmpR is part of a two-component regulatory system, in conjunction with EnvZ, for control of the porin proteins OmpC and OmpF (Figure 11). In response to the osmolarity of the medium, EnvZ phosphorylates or stimulates the dephosphorylation of OmpR which then acts to selectively stimulate or repress expression of OmpC and OmpF, thereby affecting the pore size in the outer membrane. Figure 10. PANTHER distance trees allow exploration of the relationships between sequences in a particular family, as well as visualization of some of the key information that was used to annotate the families and subfamilies. Figure 11. The E. coli pathway for the two-component system involving the OmpR transcription factor. This diagram was obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG): http://www.genome.ad.jp/kegg/.

Conclusions To improve the effectiveness of proteome-wide protein identification, previous studies have emphasized reducing sample complexity and increasing chromatographic separation as a means of ensuring good MS/MS coverage of complex mixtures (4,5). Here, we demonstrate a variation of this approach using MDLC and specific time-based exclusion lists generated from protein identification results, in conjunction with a sample simplification step based upon protein function. Depending upon sample complexity, multiple MDLC/MS/MS runs can be performed with cumulative levels of exclusion applied to obtain good MS/MS coverage on peptides in the sample. For the present study, this data acquisition strategy enabled the identification of a greater proportion of lower abundance proteins (as indicated by their codon bias) as well as greater coverage (more peptides identified) per protein. A total of 295 proteins were identified with very high confidence in this experiment from the E.coli sample enriched for nucleic acid binding proteins. Using the Celera Discovery System, important information can be readily accessed about identified proteins. Performing the database search against the annotated CDS databases allows the gene ontology information (biological process, molecular function) from the PANTHER protein classification system to be automatically imported into the Pro ID Software results. In this study, this information allowed quick assessment of the quality of the sample preparation strategy. Additionally this information provided a means for rapidly identifying interesting subsets of proteins, such as transcription factors. Many additional tools and information exist within the Celera Discovery System to allow further exploration of interesting proteins. Acknowledgements Thanks to Doug Barofsky and Martha Stapels at the Oregon State University for the E. coli proteins digests. References 1. Sharp, P.M, and Li, W-H. The codon adaptation index a measure of the directional synonymous codon usage bias, and its potential applications (1987) Nucleic Acids Res. 15, 1281-1295. 2. Thomas, P.D, Kejariwal, A., Campbell, M.J., Mi, H., Diemer, K., Guo, N., Ladunga, I., Ulitsky-Lazareva, B., Muruganujan, A., Rabkin, S., Vandergriff, J.A., Doremieux, O. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification (2003) Nucleic Acids Res. 31, 334-341. 3. Kerlavage, A., Bonazzi, V., di Tommaso, M., Lawrence, C., Li, P., Mayberry, F., Mural, R., Nodell, M., Yandell, M., Zhang, J., and Thomas, P. The Celera Discovery System TM (2002) Nucleic Acids Res. 30, 129-136. 4. Gygi, S.P, Rist, B., Griffin, T.J., Eng, J., Aebersold, R. Proteome Analysis of Low-Abundance Proteins Using Multidimensional Chromatography and Isotope-Coded Affinity Tags (2002) J. Proteome Res. 1, 47-54. 5. Corbin R.W. et al., Toward a Protein Profile of Escherichia coli: Comparison of its Transcription Profile (2003) PNAS 100, 9232-9237. AB (Design), Applera, Celera Discovery System, Interrogator and PepMap are trademarks and Applied Biosystems and QSTAR are a registered trademarks of Applera Corporation or its subsidiaries in the US and certain other countries.