PROTEOINFORMATICS OVERVIEW

Size: px
Start display at page:

Download "PROTEOINFORMATICS OVERVIEW"

Transcription

1 PROTEOINFORMATICS OVERVIEW August 11th 2016 Pratik Jagtap Center for Mass Spectrometry and Proteomics

2 Outline PROTEOMICS WORKFLOW PEAKLIST PROCESSING Search Databases Overview Protein Identification Protein Validation and Quantification Publication Guidelines Terminology RAW file Peaklist Peaklist processing Peptide-Spectral Match (PSM) Genome Assembly and annotation Variety of search databases

3 PROTEOMICS WORKFLOW Eng et al 2011 Mol Cell Proteomics. 10(11): R

4 PROTEOMICS WORKFLOW Mass Spectrometer Mass spectral data (.RAW) Processing Search databases Protein Identification Statistical validation of Protein Identification. Protein Quantitation.

5 Outline PROTEOMICS WORKFLOW PEAKLIST PROCESSING Search Databases Overview Protein Identification Protein Validation and Quantification Publication Guidelines Terminology RAW file Peaklist Peaklist processing Peptide-Spectral Match (PSM) Genome Assembly and annotation Variety of search databases

6 MASS SPECTRAL DATA.

7 MASS SPECTRAL DATA Eng et al 2011 Mol Cell Proteomics. 10(11): R Cappadona et al 2012 Amino Acids. Sep 2012; 43(3):

8 PROTEOMICS WORKFLOW Eng et al 2011 Mol Cell Proteomics. 10(11): R

9 Peaklist Processing

10 RAW DATA CONVERSION TOOLS XRawfile library from ThermoFinnigan Xcalibur software. mzxml.raw ReAdW msconvert ProteoWizard mzml Others Raw2MSM extract_msn DeconMSn DTASuperCharge

11 ORBITRAP: PROCESSING AND EFFECTS Average ppm and Standard deviation improves when MaxQuant processed files are used.

12 Peaklist Processing

13 PROTEOMICS WORKFLOW Eng et al 2011 Mol Cell Proteomics. 10(11): R

14 Outline PROTEOMICS WORKFLOW PEAKLIST PROCESSING Search Databases Overview Protein Identification Protein Validation and Quantification Publication Guidelines Terminology RAW file Peaklist Peaklist processing Peptide-Spectral Match (PSM) Genome Assembly and annotation Variety of search databases

15 PROTEOMICS WORKFLOW Mass Spectrometer Mass spectral data (.RAW) Processing Search databases Protein Identification Statistical validation of Protein Identification. Protein Quantitation.

16 DATABASE SEARCH Mass spectrum Search against database.

17 DNA GENOME PROTEOMIC DATABASE. Salzberg Genome Biology :102 doi:

18 GENOMIC AND PROTEOMIC DATABASES Finished and Published Genomes 3551 Bacterial Genomes. 211 Archaeal Genomes. 58 Eukaryal Genomes Viral Genomes

19 PROTEOMIC DATABASES CUSTOMIZED DATABASES

20 PROTEOMIC DATABASES Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB). It is a high quality annotated and nonredundant protein sequence database, which brings together experimental results, computed features and scientific conclusions. TrEMBL contains high-quality computationally analyzed records, which are enriched with automatic annotation. The translations of annotated coding sequences in the EMBL- Bank/GenBank/DDBJ nucleotide sequence database are automatically processed and entered in TrEMBL.

21 UNIPROT DATABASE

22 UNIPROT DATABASE

23 PROTEOMIC DATABASES The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq sequences form a foundation for medical, functional, and diversity studies. They provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis (especially RefSeqGene records), expression studies, and comparative analyses.

24 CUSTOMIZED PROTEOMIC DATABASES RNASeq data. Expressed sequence tags / cdna sequences. Three-frame translation Genomic DNA sequences. Six-frame translation Metagenomic databases. Translation Customized database repositories (CPTAC / UniMesh) Translation and database reduction workflows Proteomic databases. 24

25 PROTEOMICS WORKFLOW Eng et al 2011 Mol Cell Proteomics. 10(11): R

26 Outline PROTEOMICS WORKFLOW PEAKLIST PROCESSING Search Databases Overview Protein Identification Protein Validation and Quantification Publication Guidelines Terminology RAW file Peaklist Peaklist processing Peptide-Spectral Match (PSM) Genome Assembly and annotation Variety of search databases