Top-Down Proteomics Enables Comparative Analysis of Brain. Proteoforms Between Mouse Strains

Size: px
Start display at page:

Download "Top-Down Proteomics Enables Comparative Analysis of Brain. Proteoforms Between Mouse Strains"

Transcription

1 Top-Down Proteomics Enables Comparative Analysis of Brain Proteoforms Between Mouse Strains Roderick G. Davis 1, Hae-Min Park 1, Kyunggon Kim #1, Joseph B. Greer 1, Ryan T. Fellers 1, Richard D. LeDuc 1, Elena V. Romanova, Stanislav S. Rubakhin, Jonathan A. Zombeck #, Cong Wu #, Peter M. Yau, Peng Gao 1, Alexandra J. van Nispen 1, Steven M. Patrie 1, Paul M. Thomas 1, Jonathan V. Sweedler, Justin S. Rhodes and Neil L. Kelleher 1* Departments of Chemistry, Molecular Biosciences and the Proteomics Center of Excellence, Northwestern University, 1 N. Sheridan Road, Evanston, IL 00. Department of Chemistry, University of Illinois, Urbana-Champaign, 00 S. Mathews Ave, Urbana, IL 1. Department of Psychology, University of Illinois, Urbana-Champaign, 0 N. Mathews Ave, Urbana, IL 1. Roy J. Carver Biotechnology Center, Protein Sciences Facility, University of Illinois, Urbana- Champaign, 0 S. Mathews Ave, Urbana, IL # Current Address: (1) Kyunggon Kim: Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Olympic-ro -gil, Songpa-gu, Seoul, 00, Korea; () Cong Wu: Process Development, Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 9; () Jonathan Zombeck: Massachusetts Institute of Technology, NE1-01, Main Street, Cambridge, MA 01 S-1

2 Table of Supporting Information Method S-1 Data Analysis Extended Methods Figure S-1 Venn Diagrams for the distribution of proteins and proteoforms from mouse strains Figure S- LC-MS Reproducibility Figure S- Proportioned variation plot across the quantitative experiment. Figure S- Quantitative comparison of 9 proteoforms from DBA/J, FVB/NJ, and BALB/cByJ Figure S- ARPP-1 Proteoform Graphical Fragment Maps. Figure S- GeneGo analyses showing the binary comparison of strains DBA/J, FVB/NJ, or BALB/cByJ to CBL/J. Figure S- Reformatted binary comparisons for GO Processes showing the top 1 enrichment items. Figure S- Reformatted binary comparisons for GO Molecular Function showing the top 1 enrichment items. Figure S-9 Reformatted binary comparisons for GO Processes Localization the top 1 enrichment items. Table S-1 List of proteins identified by top-down proteomics reported. Table S- List of proteoforms identified by top-down proteomics reported at a % qualitative FDR S-

3 Method S-1. Data Analysis Extended Methods The data analysis pipeline is running on Northwestern University s Quest high performance computing (HPC) system, a facility that offers 9 compute nodes with over 1,000 total cores at present. To run the analysis described below, 0 RAW files (containing data acquired from five technical of samples collected from 1 mice ( mice x strains)) were processed on Quest using. SUs (service units). Specifically, one pipeline software application converts.raw files to the standard mzml format. ProSightPC s Crawler logic is then used to link MS fragmentations; contiguous scans with a 0.1 Da tolerance and within a.0 min. retention time tolerance are grouped. Mass-to-charge data for each precursor and fragmentation scan pair are deconvoluted to produce monoisotopic neutral mass values. For proteoform identification and characterization, MS 1 and MS queries were searched using a three-pronged search tree (Tight Absolute Mass, BioMarker searches and Find Unexpected Modifications) against an annotated mouse database. The Tight Absolute Mass search was conducted first with a MS 1 tolerance of. Da, and a MS tolerance of ppm. The second search, again with all queries, was a BioMarker search performed with both MS 1 and MS tolerances set to ppm. The last search was a Find Unexpected Modifications search with a MS 1 tolerance of 00 Da, and a MS tolerance of ppm, using Delta M mode, allowing for detecting proteins with one unknown modification. For quantitative data processing, proteoform-level extracted ion chromatograms were generated to determine an appropriate intensity value for each proteoform that passed a qualitative % FDR cutoff for proteoform identification. First, the isotopic distributions of all theoretical charge states of that proteoform were generated. These distributions were then used to match against observed spectral data and return intensity estimates for each scan. Next, the intensity estimates were aggregated across all scans and charge states to report one intensity value for each data file and proteoform. Finally, intensities were normalized using the average total ion chromatogram intensity for each technical replicate. These data were provided as a text file for further statistical processing (vide infra). As technical issues occurred with data collection for one of the biological replicates from each of the strains, only four biological replicates for each strain were passed on for further analysis. S-

4 Figure S-1. Venn Diagrams for the distribution of proteins and proteoforms from mouse strains. Venn diagrams for the distribution of total 11 protein identifications and fully characterized proteoforms ranging from ~. to 0 kda between CBL/J, BALB/cByJ, DBA/J and FVB/NJ strains in this study using a 1% global qualitative FDR cutoff at the protein level. Approximately 9% of identified proteins and % of characterized proteoforms were shared between at least two strains. S-

5 Figure S-. Reproducibility of LC-MS. The total ion chromatogram from each of 0 replicates ( biological replicates x technical replicates) of a strain, BALB/cByJ, by LC-MS analysis. Chromatographic reproducibility was ensured by holding the trap and nanolc-column at a constant 0 o C temperature. Typically, for a proteoform having mid-level abundance, the relative standard deviation in the retention time, base width and FWHM was 1.%,.% and.%, respectively (data not shown) S-

6 9 11 Figure S-. Variation Across Quantitative Experiment. Box-and-whisker plots where the proportional variation in normalized MS intensities were assigned to each of three different random effects; inter-strain (labeled as Treatment ), biological replicate, and technical replicate, and residual variance in CBL/J, BALB/cByJ, DBA/J and FVB/NJ strains. Those four categories of variation in this study are inter-strain variation, the Biological replicate (different animals), the Technical replicate (five LC-MS injections from each biological replicate), and the Residual, (the remaining sources of variation unassigned by the hierarchical linear model). The box indicates the range where % of the variances fall and the whiskers indicate the range encompassing 9% of the measurements. The data supports our assertion that top down proteomics can be used to measure protein differences in mouse brain tissues. 1 S-

7 A. 1 -log False Discovery Rate B. -log False Discovery Rate C. -log False Discovery Rate Up in DBA/J Up in FVB/NJ log Fold Change Up in BALB/cByJ Up in FVB/NJ log Fold Change Up in DBA/J log Fold Change Up in BALB/cByJ Figure S-. Quantitative comparison of 9 proteoforms from DBA/J, FVB/NJ, and BALB/cByJ. The volcano plots for the pairwise comparisons were generated, showing each proteoform (blue dots) as a function of relative abundance difference (x-axis, log fold-change) between the strains, and the statistical confidence (y- axis, FDR-corrected instantaneous q-value) that the difference is significant in normalized MS intensity. For BALB/cByJ v/s DBA/J (A), DBA/J v/s FVB/NJ (B), BALB/cByJ v/s FVB/NJ (C),, 01, and 19 proteoforms were identified, respectively, below a % quantitative FDR cutoff (horizontal red line) for quantitation and above a 1.-fold change (vertical red line) in proteoform abundance between the strains. S-

8 Figure S-. ARPP-1 Proteoform Graphical Fragment Maps. Top-scoring graphical fragment maps showing the HCD fragmentation for the three detected phosphorylation proteoforms of ARPP-1. The LC-MS and fragmentation data did not provide distinction between the two mono-phosphorylated proteoforms. For each proteoform the experimental mass matched the theoretical within ppm. A blue box indicates a phosphorylated serine residue. Only literature reported sequence variants and PTM combinations are used to annotate the protein entries for the mouse database. Add statement about how this approach holds true S-

9 Figure S-. GeneGo analyses showing the binary comparison of strains DBA/J, FVB/NJ, or BALB/cByJ to CBL/J. GO Analysis comparing each strain to CBL/J for enrichment in GO Processes, GO Molecular Function and GO Localization. S-9

10 Figure S-. Reformatted binary comparisons for GO Processes. The top 1 enrichment items are shown. S-

11 Figure S-. Reformatted binary comparisons for GO Molecular Function. The top 1 enrichment items are shown. S-11

12 Figure S-9. Reformatted binary comparisons for GO Localization. The top 1 enrichment items are shown. The top six scoring items are mitochondria related where the effects of drugs of abuse have been shown to occur. The strain profiles observed in the three GO categories further illustrate the relatedness of DBA/J to BALB/cByJ and CBL/J to FVB/NJ at the brain proteoform level. S-1