Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL

Size: px
Start display at page:

Download "Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL"

Transcription

1

2 Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL PIR-PSD Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database

3 o A high quality protein sequence database A non redundant protein database, with maximal coverage including splice isoforms, disease variant and PTMs. Sequence archiving essential. o Easy protein identification Stable identifiers and consistent nomenclature/controlled vocabularies o Thorough protein annotation Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source

4

5 UniProtKB/TrEMBL 1 entry per nucleotide submission UniProtKB/Swiss-Prot 1 entry per protein Redundant, automatically annotated - unreviewed Non-redundant, high-quality manual annotation - reviewed

6 UniProt/TrEMBL ENA (EMBL) DNA database PDB Sub/ Peptide Data Ensembl FlyBase WormBase VEGA (Sanger) mrna Data Patent Data

7 Manual annotation of UniProtKB/Swiss-Prot Splice variants Sequence Sequence features Annotations UniProtKB Ontologies Nomenclature References

8

9

10 Beta.uniprot.org

11 Sequence curation, stable identifiers, versioning and archiving For example erroneous gene model predictions, frameshifts...premature stop codons, read-throughs, erroneous initiator methionines.. Master

12 Splice isoforms

13 Identification of amino acid variants..and of PTMs and also Master

14 Master Sequence annotation

15 Protein nomenclature Master

16 Master

17 Controlled vocabularies used whenever possible Master headline

18 Annotation comments FUNCTION SUBCELLULAR LOCATION ALTERNATIVE PRODUCTS TISSUE SPECIFICITY DEVELOPMENTAL STAGE INDUCTION SIMILARITY CATALYTIC ACTIVITY COFACTOR ENZYME REGULATION BIOPHYSICOCHEMICAL- PROPERTIES PATHWAY SUBUNIT INTERACTION PTM RNA EDITING MASS SPECTROMETRY DOMAIN POLYMORPHISM DISRUPTION PHENOTYPE ALLERGEN DISEASE TOXIC DOSE BIOTECHNOLOGY PHARMACEUTICAL MISCELLANEOUS CAUTION SEQUENCE CAUTION WEB RESOURCE

19 Automatic Annotation for UniProtKB/TrEMBL

20 UniProtKB/Swiss-Prot manually curated proteins, capturing information available in the literature (~545,000 entries) UniProtKB/Trembl automated annotation only (~56,000,000 entries) Most proteins we now know the sequence of have not been biochemically profiled in any laboratory and probably never will

21 InterPro - a database which integrates predictive information about proteins' function from partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains. Users who have novel nucleotide or protein sequences to functionally characterise can use the software package InterProScan to run the scanning algorithms from the InterPro database

22 Master headline

23

24 Automatic Annotation o UniProtKB employs two prediction programs which are referred to as UniRule and SAAS. SAAS, Statistical Automatic Annotation System, generates a new set of decision-trees with every UniProtKB release using datamining. UniRule maintains a set of manually established and maintained annotation rules. Swiss-Prot InterPro

25 Master

26 Proteomes Definition: The complete proteome of an organism is all the proteins expressed by that organism

27 Two types of Proteomes Complete proteomes Complete sets of proteins thought to be expressed by organisms whose genomes have been completely sequenced. Reference proteomes Some complete proteomes have been selected as reference proteome sets. These cover the proteomes of wellstudied model organisms and other proteomes of interest for biomedical research.

28 Requirements for Complete Proteomes Completely sequenced genome Good gene prediction models Proteins are mapped to genome Good quality transcriptome/proteome data

29 Obtaining Proteomes

30

31 Stuck? Just ask active help and support team Feedback if you find something incorrect, outdated, missing etc please tell us.

32 Thanks for your attention