Proteomics databases
|
|
- Lorena Randall
- 5 years ago
- Views:
Transcription
1 Proteomics databases and protein characterization tools Part I Proteomics databases
2 Proteomics databases 1. Sequence databases: «The story of a protein sequence s life» 2. Swiss-Prot: a quick overview 3. UniProt utilities: UniRef and UniParc 4. Swiss-Prot and the other protein databases Where do the protein sequences come from? What s about their reliability? What do you have to take care of?
3 Real life of a protein sequence with or without annotated CDS PRF, PIR CoDing Sequences provided by submitters TrEMBL Genpept Manually annotated cdnas, ESTs, genomes, Data not submitted to public databases, delayed or cancelled EMBL, GenBank, DDBJ CoDing Sequences provided by submitter and «de novo» gene prediction RefSeq XP_NNNNN Scientific publications derived sequences PRF Swiss-Prot 3D structures UniProt: Swiss-Prot + TrEMBL + (PIR) NCBI-nr: Swiss-Prot + GenPept + (PIR) + RefSeq + PDB + PRF Let s start at the very beginning
4 with or without annotated CDS provided by authors Real life of a protein sequence cdnas, ESTs, genomes, Data not submitted to public databases, delayed or cancelled EMBL, GenBank, DDBJ CDS CoDing Sequence portion of DNA/RNA translated into protein (from Met to STOP) EMBL/GenBank/DDBJ The 3 main public nucleic acid sequence databases are EMBL (EBI)/GenBank (NCBI) /DDBJ (Japan): «different views of the same data set» within 2-3 days Contribution: EMBL 10 %; GenBank 73 %; DDBJ 17 % EMBL: since 1982
5 EMBL/GenBank/DDBJ Serve as archives Contain all public sequences derived from: Genome projects (> 80 % of entries) Sequencing centers (cdnas, ESTs ) Individual scientists ( 15 % of entries) Patent offices (i.e. European Patent Office, EPO) Currently: 30x10 6 sequences, ~36 x10 9 bp; Sequences from > different species; The tremendous increase in nucleotide sequences Mouse Other Rat Human 1980: 80 genes fully sequenced! Human/Mouse/Rat: Organisms with the highest redundancy!
6 EMBL/GenBank/DDBJ Sort of sequence museum, where sequences are preserved for eternity as they were determined, interpreted and published originally by their authors (primary sequence repository) The authors have full authority over the content of the entries they submit! (exception: TPA, since january 2003) an EMBL entry ID HSERPG standard; genomic DNA; HUM; 3398 BP. XX AC X02158; XX SV X XX DT 13-JUN-1985 (Rel. 06, Created) DT 22-JUN-1993 (Rel. 36, Last updated, Version 2) XX DE Human gene for erythropoietin XX KW erythropoietin; glycoprotein hormone; hormone; signal peptide. XX OS Homo sapiens (human) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Primates; Catarrhini; Hominidae; Homo. XX RN [1] RP RX MEDLINE; RA Jacobs K., Shoemaker C., Rudersdorf R., Neill S.D., Kaufman R.J., RA Mufson A., Seehra J., Jones S.S., Hewick R., Fritsch E.F., Kawakita M., RA Shimizu T., Miyake T.; RT Isolation and characterization of genomic and cdna clones of human RT erythropoietin; RL Nature 313: (1985). XX DR GDB; ; EPO. DR GDB; ; TIMP1. DR Swiss-Prot; P01588; EPO_HUMAN. XX keyword taxonomy references Cross-references DNA (genomic) or RNA
7 CC Data kindly reviewed (24-FEB-1986) by K. Jacobs FH Key Location/Qualifiers FH FT source FT /db_xref=taxon:9606 FT /organism=homo sapiens FT mrna join( , , , , ) FT CDS join( , , , , ) FT /db_xref=swiss-prot:p01588 FT /product=erythropoietin FT /protein_id=caa FT /translation=mgvhecpawlwlllsllslplglpvlgapprlicdsrvlqrylle FT AKEAENITTGCAEHCSLNENITVPDTKVNFYAWKRMEVGQQAVEVWQGLALLSEAVLRG FT QALLVNSSQPWEPLQLHVDKAVSGLRSLTTLLRALGAQKEAISPPDAASAAPLRTITAD FT TFRKLFRVYSNFLRGKLKLYTGEACRTGDR FT mat_peptide join( , , , ) FT /product=erythropoietin FT sig_peptide join( , ) FT exon FT /number=1 FT intron FT /number=1 FT exon FT /number=2 FT intron FT /number=2 FT exon FT /number=3 FT intron FT /number=3 FT exon FT /number=4 FT intron FT /number=4 FT exon FT /note=3' untranslated region FT /number=5 XX SQ Sequence 3398 BP; 698 A; 1034 C; 991 G; 675 T; 0 other; agcttctggg cttccagacc cagctacttt gcggaactca gcaacccagg catctctgag 60 tctccgccca agaccgggat gccccccagg aggtgtccgg gagcccagcc tttcccagat 120 CDS CoDing Sequence (proposed by submitters) Annotation (Prediction or experimentally determined) sequence
8 FT CDS complement( ) FT /db_xref="sptrembl:q9uz71" FT /note="pab2386" FT /transl_table=11 FT /product="4-aminobutyrate qui se dilate AMINOTRANSFERASE FT (EC )" FT /protein_id="cab " FT /translation="mdyprivvnppgpkakelierekrvlstgigvklfplvpkrgfgp FT FIEDVDGNVFIDFLAGAAAASTGYSHPKLVKAVKEQVELIQHSMIGYTHSERAIRVAEK FT LVKISPIKNSKVLFGLSGSDAVDMAIKVSKFSTRRPWILAFIGAYHGQTLGATSVASFQ FT VSQKRGYSPLMPNVFWVPYPNPYRNPWGINGYEEPQELVNRVVEYLEDYVFSHVVPPDE FT VAAFFAEPIQGDAGIVVPPENFFKELKKLLDEHGILLVMDEVQTGIGRTGKWFASEWFE FT VKPDMIIFGKGVASGMGLSGVIGREDIMDITSGSALLTPAANPVISAAADATLEIIEEE FT NLLKNAIEVGSFIMKRLNELKEQFDIIGDVRGKGLMIGVEIVKENGRPDPEMTGKICWR FT AFELGLILPSYGMFGNVIRITPPLVLTKEVAEKGLEIIEKAIKDAIAGKVERKVVTWH"
9 Proteomics databases 1. Sequence databases: «The story of a protein sequence s life» 2. Swiss-Prot: a quick overview 3. UniProt utilities: UniRef and UniParc 4. Swiss-Prot and the other protein databases Real life of a protein sequence with or without annotated CDS Data not submitted to public databases, delayed or cancelled cdnas, ESTs, genomes, EMBL Nucleic acids CoDing Sequences provided by submitters TrEMBL Amino acids Swiss-Prot Manually annotated
10 Since december 15, 2003 Swiss-Prot and TrEMBL constitute the Knowledgebase (integration of the PIR data) -> give access to all known* protein sequences * submitted to the public databases (EMBL, GenBank, DDJB, SWISS-PROT)
11 a SWISS-PROT entry = a protein sequence associated with - manually-checked - well-structured - periodically-updated - searchable biological information a TrEMBL entry = a protein sequence associated with - computer-annotated - well-structured - periodically-updated - searchable biological information
12 CDS TrEMBL EMBL Swiss-Prot CDS TrEMBL Once in Swiss-Prot, no more in TrEMBL -> Minimal redundancy Annotation of conflicts EMBL Swiss-Prot
13 CDS TrEMBL EMBL Swiss-Prot How to make things clear? Depending of the server UniProt = Swiss-Prot + TrEMBL = SPTR = SWALL Swiss-Prot =UniProt/Swiss-Prot TrEMBL= UniProt/TrEMBL=SPTrEMBL EMBnet 2004: Proteomics TrEMBL=SPTrEMBL + using TrEMBLnew** **is going to disappear soon!
14 Swiss-Prot 1. Minimal redundancy; 2. Maximal manual annotation; 3. Integration with other databases. Swiss-Prot 1. Minimal redundancy; 1 gene (1 species) -> 1 entry Swiss-Prot Identical sequences are merged, as are variants, fragments, alternative splicing isoforms.
15 Swiss-Prot 1. Minimal redundancy. 2. Maximal manual annotation: Function(s); Interactions; Subcellular localization and tissue expression; Structure (domains, ); Post translational modifications (PTMs); Variants (alternative splicing, polymorphisms, ); Similarities Swiss-Prot 1. Minimal redundancy; 2. Manual annotation; 3. Integration with other databases: Release (26-Sep-2003): 83 links to other datases.
16 Up-to-date sources: Swiss-Prot -> ExPASy Since 1986 ( TrEMBL Since > EBI (European Bioinformatics Institute) ( You can install the ExPASyBar on your computer Amos links
17 Search also with accession numbers (Swiss-Prot or other databases)
18 Swiss-Prot an overview View «by default» on the ExPASy server
19 ExPASy EBI NCBI
20 Not always obvious to known from which database your protein sequence is derived from! Topology of a Swiss-Prot entry EMBnet 2004: Proteomics sequence using
21 Swiss-Prot Protein sequence: - The longest sequence is usually «displayed» - Precursor (except INIT_MET 0 and «amino acid sequencing») - Comparison of genomic and cdna sequences -> carefully checked; validated! -> choose the most representative -> The sequence quality is always increasing. Swiss-Prot s daily bread Alternative splicing? Same gene? Polymorphisms? Alternative initiation? RNA editing? Usage of an alternative promoter? Selenocystein? Fragment? Sequencing errors?
22 Topology of a Swiss-Prot entry Identifier Accession Nr. Protein name Gene name EMBnet 2004: Proteomics sequence using Always cite the primary accession number!
23 Topology of a Swiss-Prot entry Identifier Accession Nr. Protein name Gene name Taxonomy EMBnet 2004: Proteomics sequence using Topology of a Swiss-Prot entry Identifier Accession Nr. Protein name Gene name Taxonomy References EMBnet 2004: Proteomics sequence using
24 References Complete sequences; Fragments ; Function, characterization, interaction ; Post translational modifications; 3D structure (crystallography or NMR); Polymorphisms. Topology of a Swiss-Prot entry Comments Identifier Accession Nr. Protein name Gene name Taxonomy References sequence
25 Comment lines Function(s) and role(s); enzymes: a. Catalytic activity (if EC number) b. Cofactor c. Enzyme regulation d. Pathway Subunit (Protein/protein interactions) Subcellular location Alternative products (alt. splicing, alt. initiation, RNA editing) Tissue specificity (Northern and Western results) Developmental stage Induction (genetic control) Domain Post translational modifications (PTM) Mass spectrometry Polymorphisms Disease Biotechnology Pharmaceutical Miscellaneous Similarities Caution Database (specialized cross-references) Comment lines Information is derived from: Publications; Databases; Personal communications; Predictions; Brain storming
26 Experimental qualifiers: «-»: experimentally proved; «By similarity»: experimentally proved in an ortholog or in another member of the family; «Probable»: not proved, but realistic; «Potential»: predicted (). ICOL_HUMAN, O75144 Experimental qualifiers: «-»: experimentally proved; «By similarity»: experimentally proved in an ortholog or in another member of the family; «Probable»: not proved but realistic; «Potential»: predicted (). AAA1_HUMAN, Q9NS82 BRH2_HUMAN, Q9NY43
27 Topology of a Swiss-Prot entry Comments Identifier Accession Nr. Protein name Gene name Taxonomy Cross-references References sequence Cross-references (X-ref) Swiss-Prot was the first database with X-ref.; Explicit links to 53 databases; Implicit X-references to 30 additional db added by the ExPASy servers on the WWW (such as GenBank, Ensembl, ) => links to 83 databases from the ExPASy servers Currently 1.2x10 6 cross-references in Swiss-Prot Gasteiger et al., Curr. Issues Mol. Biol. (2001), 3(3): 47-55
28 Swiss-Prot currently acts as the main index for the 15 federated 2D-PAGE databases. Cross-references 1. ICE8_HUMAN Q14790 ADN (Index of low redundancy) Examples of implicit links to GenBank and DDBJ added on the fly by the ExPASy server 3D genomic
29 Cross-references S_HUMAN P D-PAGE
30 Theoritically computed pi and MW Experimentally determined position Theoritically computed pi and MW with potential phosphorylation and acetylation sites Topology of a Swiss-Prot entry Comments Identifier Accession Nr. Protein name Gene name Taxonomy Cross-references References Keywords sequence
31 Keywords (automated and manual annotation) Q9HC96 Calpain 10 n=481 entries
32 Topology of a Swiss-Prot entry Comments Identifier Accession Nr. Protein name Gene name Taxonomy Cross-references References Keywords Feature table sequence Sequence features: Manual annotation ICOL_HUMAN, O75144 General topology
33 Sequence features: Manual annotation ICOL_HUMAN, O75144 General topology Domains Sequence features: Manual annotation ICOL_HUMAN, O75144 General topology Domains PTM
34 Experimental qualifiers: «-»: experimentally proved; «By similarity»: experimentally proved in an ortholog or in another member of the family; «Probable»: not proved but realistic; «Potential»: predicted (). ICOL_HUMAN, O75144 Sequence features: Manual annotation ICOL_HUMAN, O75144 General topology Domains PTM Alternative splicing
35 All the «alternatively spliced sequences» are available, on the ExPASy server, in Fasta format, i.e. for Blast searches or proteomic tools. Some proteomic tools, on other server, such as Mascot, also include these «alternatively spliced sequences» in their search engines. BRC2_HUMAN, P51587 Polymorphisms Polymorphisms Differences between the sequence shown and other submitted sequences
36 Swiss-Prot and PTM annotations Swiss-Prot PTM annotations References (Rx lines) Comments (CC lines) CC -!- PTM: Keywords (KW lines) KW Feature table (FT lines) FT references comments keywords features
37 references references Comments (CC PTM) The N-terminus is blocked. Phosphorylation of Tyr-660 reduces the ability of 4.1 to promote the assembly of the spectrin/actin/4.1 ternary complex. comments Sulfated.
38 keywords Cleavage : Signal, Transit peptide, Protein splicing, etc. Linkage : Acetylation, Amidation, D- amino acid, Formylation, Glycoprotein, GPI-anchor, Hydroxylation, Hypusine, Iodination, Myristate, Palmitate, Phosphorylation, Cross-link Prenylation, : Sulfation, etc. Thioether bond, Thioester bond. keywords features Cleavage : INIT_MET, PROPEP, SIGNAL, TRANSIT Linkage : MOD_RES, CARBOHYD, LIPID, BINDING Cross-link : DISULFID, CROSSLNK features sequence
39 Swiss-Prot & TrEMBL introduce a new arithmetical concept! Redundancy in TrEMBL & Redundancy between TrEMBL and Swiss-Prot In 3 years.more than protein sequences But, in the future: redundancy is going to decrease: «new» genome sequencing -> «new» proteins (AB, sept 2002) In the case of human proteins, the redundancy is still very high: about * * human gene number estimation: Are missing: Sequences not submitted to EMBL/GenBank/DDJB (and PIR) Not yet predicted or known genes («no CDS provided by the submitters» or no DNA sequence) Confidential data (Patent application sequences) Immunoglobulins, T-cell receptors (-> UniParc)
40 Take home message Swiss-Prot is a nonredundant, manually annotated and highly crossreferenced protein knowledgebase. Be aware of the differences between TrEMBL and Swiss-Prot. Always cite the Accession number, not the ID. We need your feedback! swiss-prot@expasy.org Righting the wrongs Sequences are rarely deposited in a mature state; as with all scientific research, DNA and protein annotation is a continual process of learning, revision and corrections. Sequencing error rates: ~1 base in Making people aware of errors is good and great; making people aware that they re responsible also for correcting errors is even greater C. Hardley, EMBO reports, 4(9), 2003.
41 Proteomics databases 1. Sequence databases: «The story of a protein sequence s life» 2. Swiss-Prot: a quick overview 3. UniProt utilities: UniRef and UniParc 4. Swiss-Prot and the other protein databases UniProt consortium (since oct. 2002): The UniProt Knowledgebase (UniProt) (Swiss-Prot and TrEMBL; integration of PIR data) (Release 1 dec. 2003). The UniProt Non-redundant Reference (UniRef) databases combine closely related sequences into a single record to speed BLAST searches. The UniProt Archive (UniParc) is a comprehensive repository, reflecting the history of all protein sequences.
42 UniRef useful for comprehensive BLAST searches by providing sets of representative sequences «Collapsing BLAST results» = Three collections of sequences clusters from the UniProt knowledgebase (Swiss-Prot, TrEMBL): One UniRef100 entry -> all identical sequences (including fragments) One UniRef90 entry -> sequences that have at least 90 % or more identity One UniRef50 entry -> sequences that are at least 50 % identical Independently of the species! BLASTP: UniRef100 UniRef100 does not include TrEMBLnew (tn), because TrEMBLnew is going to «disappear» soon
43 BLASTP: UniRef100 BLASTP: UniRef90
44 BLASTP: UniRef90 BLASTP: UniRef50
45 UniParc allows to keep track of a protein sequence and of its integration in various databases UniParc Use with extreme caution: also contains pseudogene, incorrect CDS prediction etc! Also patent office database data (EPO, ESPO ).
46 Proteomics databases 1. Sequence databases: «The story of a protein sequence s life» 2. Swiss-Prot: a quick overview 3. UniProt utilities: UniRef and UniParc 3. Swiss-Prot and the other protein databases Real life of a protein sequence cdnas, ESTs, genomes, Data not submitted to public databases, delayed or cancelled EMBL Nucleic acids CoDing Sequences provided by submitters TrEMBL Amino acids Swiss-Prot Manually annotated
47 Real life of a protein sequence cdnas, ESTs, genomes, Data not submitted to public databases, delayed or cancelled PRF CoDing Sequences provided by submitters TrEMBL Genpept Swiss-Prot Manually annotated EMBL, GenBank, DDBJ CoDing Sequences provided by submitter and «de novo» gene prediction RefSeq XP_NNNNN Scientific publications derived sequences PRF UniProt: Swiss-Prot + TrEMBL + (PIR) NCBI-nr: Swiss-Prot + GenPept + (PIR) + RefSeq + PDB + PRF Protein sequences: «NR database»
48 Scientific publications derived sequences (integrated into TrEMBL) NCBI-nr: Swiss-Prot + GenPept + (PIR) + RefSeq + PDB + PRF ~TrEMBL, except that it is redundant with Swiss-Prot All PIR data have been integrated into Swiss-Prot and TrEMBL (UniProt) 3D structure database: all the protein sequences which have been cristallized (Swiss-Prot/TrEMBL are crosslinked to PDB) NCBI Reference Sequence (RefSeq) The RefSeq collection: genomic DNA, transcript (RNA), and protein products RefSeq provides a non-redundant set of sequences, derived from GenBank, the literature and gene prediction. Release 3 includes over proteins from 2218 (!!! 1 entry = 1 sequence.) organisms (including 1100 viruses and 150 bacteria). The sequence data are tightly linked to LocusLink which contains the associated biological information («interdependent curated resources»)
49 Example 1 Search for a gene name
50 Protein sequences: «NR database» AMBN 20 entries Swiss-Prot
51 «Entrez protein AMBN» Genpept Genpept RefSeq RefSeq RefSeq AC KW Taxonomy References Correspond to Swiss-Prot entry AMBN_HUMAN Q9NP70 GenBank source GenBank source
52 used for the construction of the RefSeq entry Description of the sequence differences Annotation
53 Example 2 BLAST searches Human EPO: Blastp against Swiss-Prot/TrEMBL (at the ExPASy server) *
54 Human EPO: Blastp against NR All these human sequences are integrated into the corresponding Swiss-Prot entry with the annotation of their differences (conflicts, variant, fragments ) Scientific publications derived sequences (integrated into TrEMBL) NCBI-nr: Swiss-Prot + GenPept + (PIR) + RefSeq + PDB + PRF ~TrEMBL, except that it is redundant with Swiss-Prot All PIR data have been integrated into Swiss-Prot and TrEMBL (UniProt) 3D structure database: all the protein sequences which have been cristallized (Swiss-Prot/TrEMBL are crosslinked to PDB)
55 PDB: Protein Data Bank Managed by Research Collaboratory for Structural Bioinformatics (RCSB) (USA). Contains macromolecular structure data on proteins, nucleic acids, protein-nucleic acid complexes, and viruses. Proteins represent more than 90% of available structures Contain the spatial coordinates of macromolecules whose 3D structure has been obtained by X-ray or NMR studies Specialized programs allow the visualization of the corresponding 3D structure (e.g., SwissPDB-viewer, Chime, Rasmol)). Currently there are structural data for about molecules, but far less protein family (highly redundant)! PDB: example HEADER LYASE(OXO-ACID) 01-OCT-91 12CA 12CA 2 COMPND CARBONIC ANHYDRASE /II (CARBONATE DEHYDRATASE) (/HCA II) 12CA 3 COMPND 2 (E.C ) MUTANT WITH VAL 121 REPLACED BY ALA (/V121A) 12CA 4 SOURCE HUMAN (HOMO SAPIENS) RECOMBINANT PROTEIN 12CA 5 AUTHOR S.K.NAIR,D.W.CHRISTIANSON 12CA 6 REVDAT 1 15-OCT-92 12CA 0 12CA 7 JRNL AUTH S.K.NAIR,T.L.CALDERONE,D.W.CHRISTIANSON,C.A.FIERKE 12CA 8 JRNL TITL ALTERING THE MOUTH OF A HYDROPHOBIC POCKET. 12CA 9 JRNL TITL 2 STRUCTURE AND KINETICS OF HUMAN CARBONIC ANHYDRASE 12CA 10 JRNL TITL 3 /II$ MUTANTS AT RESIDUE VAL CA 11 JRNL REF J.BIOL.CHEM. V CA 12 JRNL REFN ASTM JBCHA3 US ISSN CA 13 REMARK 1 12CA 14 REMARK 2 12CA 15 REMARK 2 RESOLUTION. 2.4 ANGSTROMS. 12CA 16 REMARK 3 12CA 17 REMARK 3 REFINEMENT. 12CA 18 REMARK 3 PROGRAM PROLSQ 12CA 19 REMARK 3 AUTHORS HENDRICKSON,KONNERT 12CA 20 REMARK 3 R VALUE CA 21 REMARK 3 RMSD BOND DISTANCES ANGSTROMS 12CA 22 REMARK 3 RMSD BOND ANGLES 1.3 DEGREES 12CA 23 REMARK 4 12CA 24 REMARK 4 N-TERMINAL RESIDUES SER 2, HIS 3, HIS 4 AND C-TERMINAL 12CA 25 REMARK 4 RESIDUE LYS 260 WERE NOT LOCATED IN THE DENSITY MAPS AND, 12CA 26 REMARK 4 THEREFORE, NO COORDINATES ARE INCLUDED FOR THESE RESIDUES. 12CA 27
56 PDB (cont.) SHEET 3 S10 PHE 66 PHE 70-1 O ASN 67 N LEU 60 12CA 68 SHEET 4 S10 TYR 88 TRP 97-1 O PHE 93 N VAL 68 12CA 69 SHEET 5 S10 ALA 116 ASN O HIS 119 N HIS 94 12CA 70 SHEET 6 S10 LEU 141 VAL O LEU 144 N LEU CA 71 SHEET 7 S10 VAL 207 LEU O ILE 210 N GLY CA 72 SHEET 8 S10 TYR 191 GLY O TRP 192 N VAL CA 73 SHEET 9 S10 LYS 257 ALA O LYS 257 N THR CA 74 SHEET 10 S10 LYS 39 TYR 40 1 O LYS 39 N ALA CA 75 TURN 1 T1 GLN 28 VAL 31 TYPE VIB (CIS-PRO 30) 12CA 76 TURN 2 T2 GLY 81 LEU 84 TYPE II(PRIME) (GLY 82) 12CA 77 TURN 3 T3 ALA 134 GLN 137 TYPE I (GLN 136) 12CA 78 TURN 4 T4 GLN 137 GLY 140 TYPE I (ASP 139) 12CA 79 TURN 5 T5 THR 200 LEU 203 TYPE VIA (CIS-PRO 202) 12CA 80 TURN 6 T6 GLY 233 GLU 236 TYPE II (GLY 235) 12CA 81 CRYST P CA 82 ORIGX CA 83 ORIGX CA 84 ORIGX CA 85 SCALE CA 86 SCALE CA 87 SCALE CA 88 ATOM 1 N TRP CA 89 ATOM 2 CA TRP CA 90 ATOM 3 C TRP CA 91 ATOM 4 O TRP CA 92 ATOM 5 CB TRP CA 93 ATOM 6 CG TRP CA 94 ATOM 7 CD1 TRP CA 95 ATOM 8 CD2 TRP CA 96 ATOM 9 NE1 TRP CA 97 ATOM 10 CE2 TRP CA 98 ATOM 11 CE3 TRP CA 99 ATOM 12 CZ2 TRP CA 100 ATOM 13 CZ3 TRP CA 101 ATOM 14 CH2 TRP CA 102. Coordinates of each atom The same PDB entry visualized with Chime
57 3D structure database: other There are all derived from PDB data! HSSP: Homology-derived secondary structure of proteins FSSP: structural alignment SCOP: Structural classification of proteins CATH: hierarchical domain classification of protein structures HomStrad: (HOMologous STRucture Alignment Database) DALI server (EBI): network service for comparing protein structures in 3D. Protein databases used by the protein identification tools the jungle
58 PROWL: NCBInr, Swiss-Prot, dbest Protein prospector: NCBInr, Swiss-Prot, dbest, GenPept, Ludwignr, OWL*. Peptident (Aldente): Swiss-Prot, TrEMBL. Mascot: NCBInr, Swiss-Prot, dbest, OWL*, MSDB * OWL is obsolete since 1999 Matrix Science (Mascot) Sequence databases MSDB: non-identical protein sequence database Contains sequences derived from: PIR (now integrated into UniProt (Swiss-Prot /TrEMBL)) TrEMBL REMTrEMBL (does not exist anymore, see UniParc) GenBank Swiss-Prot NRL3D (PDB derived sequences)
59 The AC number jungle Type of record GenBank/EMBL/DDBJ Swiss-Prot/TrEMBL RefSeq nucleotide RefSeq protein RefSeq prediction PDB (protein structure) Sample Accession Format One letter followed by five digits: e.g. U12345 Two letters followed by 6 digits: e.g. AF One letter (O, P, Q) and five digits/letters: e.g. P12345 Two letters, underscore bar and six digit: e.g. mrna NM_ e.g. genomic NT_ e.g. NP_00483 e.g. XM_ e.g. XP_ One digit followed by three letters: e.g. 1TUP The end of part I
60 PART II Protein characterization tools What can we learn in silico from a amino acid sequence? 1. Domain, family attribution 2. Subcellular location 3. Posttranslational modifications (PTMs)
61 What can we learn in silico from a amino acid sequence? 1. Domain, family attribution 2. Subcellular location 3. Posttranslational modifications (PTMs) Protein domain/family: some definitions Most proteins have «modular» structures Estimation: ~ 3 domains / protein Domains not only share a common structure but have also often a similar function that contributes to the global activity of the proteins which contain them.
62 Domains are identified by multiple sequence alignments Domains can be defined by different methods: Pattern (regular expression); used for very conserved domains Profiles (weighted matrices): two-dimensional tables of position specific match-, gap-, and insertion-scores, derived from aligned sequence families; used for less conserved domains Hidden Markov Model (HMM); probabilistic models; an other method to generate profiles. Pattern-Profile Pattern: [LIVM]-[ST]-A-[STAG]-H-C Yes or no Profile: ID TRYPSIN_DOM; MATRIX. AC PS50240; DT DEC-2001 (CREATED); DEC-2001 (DATA UPDATE); DEC-2001 (INFO UPDATE). DE Serine proteases, trypsin domain profile. MA /GENERAL_SPEC: ALPHABET='ABCDEFGHIKLMNPQRSTVWYZ'; LENGTH=234; MA /DISJOINT: DEFINITION=PROTECT; N1=6; N2=229; MA /NORMALIZATION: MODE=1; FUNCTION=LINEAR; R1=0.0169; R2= ; TEXT='-LogE'; MA /CUT_OFF: LEVEL=0; SCORE=1134; N_SCORE=9.5; MODE=1; TEXT='!'; MA /CUT_OFF: LEVEL=-1; SCORE=775; N_SCORE=6.5; MODE=1; TEXT='?'; MA /DEFAULT: M0=-9; D=-20; I=-20; B1=-60; E1=-60; MI=-105; MD=-105; IM=-105; DM=-105; MA /I: B1=0; BI=-105; BD=-105; MA A B D E F G H I K L M N P Q R S T V W Y MA /M: SY='I'; M= -8,-29,-34,-26, 3,-34,-24, 34,-26, 19, 15,-24,-21,-21,-24,-19, -8, 25,-19, 3; MA /M: SY='N'; M= 0, 14, 10, 1,-22, -1, 6,-23, -4,-26,-17, 20,-14, -1, -6, 13, 2,-20,-34,-15; MA /M: SY='E'; M= -4, 4, 7, 14,-26,-13, -7,-23, 3,-22,-16, 2, 7, 3, -3, 2, -2,-21,-30,-18; MA /M: SY='R'; M=-12, 5, 5, 7,-23,-17, 3,-24, 8,-20,-12, 7,-16, 10, 12, -2, -6,-21,-27, -9; MA /M: SY='W'; M=-16,-33,-35,-27, 13,-22,-24,-11,-18,-13,-13,-31,-27,-20,-18,-30,-21,-18, 97, 25; MA /M: SY='V'; M= 1,-29,-31,-28, -1,-30,-29, 31,-22, 13, 11,-27,-27,-26,-22,-12, -2, 41,-27, -8; MA /M: SY='L'; M= -8,-29,-31,-22, 9,-30,-21, 23,-27, 37, 20,-28,-28,-21,-20,-25, -8, 17,-20, -1; MA /M: SY='T'; M= 2, -1, -9, -9,-11,-17,-19,-10,-10,-13,-11, 1,-11, -9,-10, 23, 43, 0,-32,-12; MA /M: SY='A'; M= 45, -9,-19,-10,-20, -2,-15,-11,-10,-11,-10, -9,-11, -9,-19, 10, 1, -1,-21,-18; MA /M: SY='A'; M= 40, -9,-17, -8,-21, 5,-18,-14, -9,-13,-12, -8,-11, -9,-16, 9, -2, -5,-21,-21; MA /M: SY='H'; M=-18, 0, 0, 1,-21,-19, 89,-29, -8,-21, -1, 9,-19, 11, 0, -7,-17,-29,-30, 16; MA /M: SY='C'; M= -9,-18,-28,-29,-20,-29,-29,-29,-29,-20,-19,-18,-39,-29,-29, -9, -9, -9,-49,-29; MA /I: E1=0; IE=-105; DE=-105; // score/threshold
63 Protein domain/family db PROSITE Patterns / Profiles ProDom Aligned motifs (PSI-BLAST) (Pfam B) PRINTS Aligned motifs Pfam HMM (Hidden Markov Models) SMART HMM TIGRfam HMM I n t e r p r o DOMO BLOCKS CDD(CDART) Aligned motifs Aligned motifs (PSI-BLAST) PSI-BLAST(PSSM) of Pfam and SMART InterPro Search simultaneously many domain databases (PRINTS, PROSITE, Pfam, ProDom, SMART, and TIGRFAMs). Contains an unique AC, functional description of the domain and references. Links are made back to the relevant member databases.
64
65 What can we learn in silico from a amino acid sequence? 1. Domain, family attribution 2. Subcellular location 3. Posttranslational modifications (PTMs) Protein pathway in Eukaryota ---> per default with a specific signal Secretory pathway
66
67 What can we learn in silico from a amino acid sequence? 1. Domain, family attribution 2. Subcellular location 3. Posttranslational modifications (PTMs) from genome to proteome ~ human genes alternative splicing of mrna 2-5 fold increase post-translational modifications of proteins (PTMs) 5-10 fold increase ~ 1'000'000 human proteins ~ human transcripts protein complexity
68 PTM diversity GPI Myr GPI Ngly GPI Ogly GPI GPI GPI GPI GPI Pho Sul Am Amidation AcN Acetylation N-terminal AcI Acetylation internal Alk Alkylation Adp ADP-ribosylation Bio Biotinylation Bro Bromination Cgly C-linked glycosylation Ogly O-linked glycosylation Ngly N-linked glycosylation Dea Deamidation Sul Sulfation Far Farnesylation Ger Geranylgeranylation GPI GPI-anchoring Met Methylation Myr Myristoylation Hyd Hydroxylation Pho Phosphorylation Pal Palmitoylation Pyr Pyrrolidone carboxylic acid Oxo 2-amino-3-oxopropionic acid Three major categories cleavage linkage x-linking initiator Met, signal and transit peptides, propeptides, complex processing, etc. simple chemical groups: phosphate, sulfate, methyl, hydroxyl, acetate, etc. complex molecules: N-, O- or C-linked glycans, lipids (e.g. palmitate, myristate, GPI) disulfide bonds, thioester, thioether bonds, etc.
69 PTM database RESID is a database of protein post-translational modifications with descriptive, chemical, structural and bibliographic information. contains 351 entries (last update nov 2003)
70 PTM prediction tools PTM prediction on ExPASy + PROSITE predictions (n~15)
71 PTM prediction -> Beware the «biological consistency»! -> Organisms (Eubacteria, Archae, Eukaryota) -> Subcellular location -> secretory pathway (ER, Golgi) -> shuttle between organelles -> topology -> A well characterized orthologous protein
72 Some statistics Number of PTMs in Swiss-Prot release 40 Pot./prob. By sim. all organisms Exp. total signal peptide N-GlcNAc O-GalNAc O-GlcNAc phosphorylation sulfation myristate GPI-anchor 108 Total number of proteins < total number of PTMs PTM annotation in SWISS-PROT: all organisms acetyl phosphate methyl sulfate total proven
73 We need your help! The end of part II
Biological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SLU 2017 Biological Databases Sequence Databases Genome Databases Structure Databases Sequence Databases The sequence databases are the
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SGBC-SLU 2016 VALIDATION Experimental Literature Manual or semi-automatic computational analysis EXPERIMENTAL Costs Needs skilled manpower
More informationWhat is a database? biological databases. An introduction to. A collection of. Includes also associated tools (software) data
An introduction to biological databases Marie-Claude.Blatter@isb-sib.ch A collection of What is a database? structured searchable (index) -> table of contents updated periodically (release) -> new edition
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More information1. Proteomics database contents Protein sequence databases
1. Proteomics contents Protein sequence s Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support ProteoRed Proteomics Facility, National Center for Biotechnology, Madrid Menu Introduction
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationBioinformatics overview
Bioinformatics overview Aplicações biomédicas em plataformas computacionais de alto desempenho Aplicaciones biomédicas sobre plataformas gráficas de altas prestaciones Biomedical applications in High performance
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationNiceProt View of Swiss-Prot: P18907
Hosted by NCSC US ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot Mirror sites: Australia Bolivia Canada China Korea Switzerland Taiwan Search Swiss-Prot/TrEMBL for horse alpha Go Clear NiceProt
More informationAAGTGCCACTGCATAAATGACCATGAGTGGGCACCGGTAAGGGAGGGTGATGCTATCTGGTCTGAAG. Protein 3D structure. sequence. primary. Interactions Mutations
Introduction to Databases Lecture Outline Shifra Ben-Dor Irit Orr Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
More informationRedundancy at GenBank => RefSeq. RefSeq vs GenBank. Databases, cont. Genome sequencing using a shotgun approach. Sequenced eukaryotic genomes
Databases, cont. Redundancy at GenBank => RefSeq http://www.ncbi.nlm.nih.gov/books/bv.fcg i?rid=handbook RefSeq vs GenBank Many sequences are represented more than once in GenBank 2003 RefSeq collection
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationSince 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL
Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL PIR-PSD Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database o A high quality
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationDr. R. Sankar, BSE 631 (2018)
Pauling, Corey and Branson Diffraction of DNA http://www.nature.com/scitable/topicpage/dna-is-a-structure-that-encodes-biological-6493050 In short, stereochemistry is important in determining which helices
More informationBioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases
Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras Lecture - 5a Protein sequence databases In this lecture, we will mainly discuss on Protein Sequence
More informationVirtual bond representation
Today s subjects: Virtual bond representation Coordination number Contact maps Sidechain packing: is it an instrumental way of selecting and consolidating a fold? ASA of proteins Interatomic distances
More informationStructural bioinformatics
Structural bioinformatics Why structures? The representation of the molecules in 3D is more informative New properties of the molecules are revealed, which can not be detected by sequences Eran Eyal Plant
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationWill discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice
Will discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice integration - web system (V) 1 Touring the Protein Space (outline) 1. Protein Sequence - how rich? How
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationWeb-based Bioinformatics Applications in Proteomics
Web-based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu January 30, 2009 NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ 1 Pubmed
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationONLINE BIOINFORMATICS RESOURCES
Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationFACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE
FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)
More informationWeb based Bioinformatics Applications in Proteomics. Genbank
Web based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu February 9, 2010 Genbank Primary nucleic acid sequence database Maintained by NCBI National Center for Biotechnology
More informationBasic protein and peptide science for proteomics. Henrik Johansson
Basic protein and peptide science for proteomics Henrik Johansson Proteins are the main actors in the cell Membranes Transport and storage Chemical factories DNA Building proteins Structure Proteins mediate
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationBioinformatics Introduction to genomics and proteomics II
Bioinformatics Introduction to genomics and proteomics II ulf.schmitz@informatik.uni-rostock.de Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de Ulf Schmitz, Introduction to genomics
More informationBacterial Genome Annotation
Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control
More informationCryo-electron microscopy
Cryo-electron microscopy Liao et al., Nature 504, 107 (2013) TRPV1 receptor (receptor for capsaicin making chili hot ) 3.4 Å resolution breaking side-chain resolution barrier (PDB: 3J5P) Protein Structure
More informationChapter Twelve Protein Synthesis: Translation of the Genetic Message
Mary K. Campbell Shawn O. Farrell international.cengage.com/ Chapter Twelve Protein Synthesis: Translation of the Genetic Message Paul D. Adams University of Arkansas 1 Translating the Genetic Message
More informationAn Introduction to Bioinformatics for Biological Sciences Students
An Introduction to Bioinformatics for Biological Sciences Students Department of Microbiology and Immunology, McGill University Version 2.5 (For the BIOC-300 lab), March 2006 2 AN INTRODUCTION TO BIOINFORMATICS
More informationRegulation of eukaryotic transcription:
Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:
More informationBioinformatics for Cell Biologists
Bioinformatics for Cell Biologists 15 19 March 2010 Developmental Biology and Regnerative Medicine (DBRM) Schedule Monday, March 15 09.00 11.00 Introduction to course and Bioinformatics (L1) D224 Helena
More informationAlgorithms in Bioinformatics ONE Transcription Translation
Algorithms in Bioinformatics ONE Transcription Translation Sami Khuri Department of Computer Science San José State University sami.khuri@sjsu.edu Biology Review DNA RNA Proteins Central Dogma Transcription
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationBioinformatics Practical Course. 80 Practical Hours
Bioinformatics Practical Course 80 Practical Hours Course Description: This course presents major ideas and techniques for auxiliary bioinformatics and the advanced applications. Points included incorporate
More informationIntroduction to Molecular Biology Databases
Introduction to Molecular Biology Databases Laboratorio de Bioinformática Centro de Astrobiología INTA-CSIC Centro de Astrobiología PRESENT BIOLOGY RESEARCH Data sources Genome sequencing projects: genome
More informationNCBI Molecular Biology Resources
NCBI Molecular Biology Resources Part 2: Using NCBI BLAST December 2009 Using BLAST Basics of using NCBI BLAST Using the new Interface Improved organism and filter options New Services Primer BLAST Align
More informationMotif Search CMSC 423
Motif Search CMSC 423 Central Dogma of Biology proteins Translation mrna (T U) Transcription Genome DNA = double-stranded, linear molecule each strand is string over {A,C,G,T} strands are complements of
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationProtein Structure Databases, cont. 11/09/05
11/9/05 Protein Structure Databases (continued) Prediction & Modeling Bioinformatics Seminars Nov 10 Thurs 3:40 Com S Seminar in 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas
More informationBi Lecture 3 Loss-of-function (Ch. 4A) Monday, April 8, 13
Bi190-2013 Lecture 3 Loss-of-function (Ch. 4A) Infer Gene activity from type of allele Loss-of-Function alleles are Gold Standard If organism deficient in gene A fails to accomplish process B, then gene
More informationProtein structure. Wednesday, October 4, 2006
Protein structure Wednesday, October 4, 2006 Introduction to Bioinformatics Johns Hopkins School of Public Health 260.602.01 J. Pevsner pevsner@jhmi.edu Copyright notice Many of the images in this powerpoint
More informationIntroduction to Bioinformatics. What are the goals of the course? Who is taking this course? Textbook. Web sites. Literature references
Introduction to Bioinformatics Who is taking this course? People with very diverse backgrounds in biology Some people with backgrounds in computer science and biostatistics Most people (will) have a favorite
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationZool 3200: Cell Biology Exam 3 3/6/15
Name: Trask Zool 3200: Cell Biology Exam 3 3/6/15 Answer each of the following questions in the space provided; circle the correct answer or answers for each multiple choice question and circle either
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationHomology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen
Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationExercises (Multiple sequence alignment, profile search)
Exercises (Multiple sequence alignment, profile search) 8. Using Clustal Omega program, available among the tools at the EBI website (http://www.ebi.ac.uk/tools/msa/clustalo/), calculate a multiple alignment
More informationIntroduction to protein structure analysis and prediction
Introduction to protein structure analysis and prediction Mónica Chagoyen monica.chagoyen@cnb.csic.es Protein sequence analysis and prediction service Centro Nacional de Biotecnologia (CNB-CSIC) 24-26
More informationUnit 1. DNA and the Genome
Unit 1 DNA and the Genome Gene Expression Key Area 3 Vocabulary 1: Transcription Translation Phenotype RNA (mrna, trna, rrna) Codon Anticodon Ribosome RNA polymerase RNA splicing Introns Extrons Gene Expression
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationLecture for Wednesday. Dr. Prince BIOL 1408
Lecture for Wednesday Dr. Prince BIOL 1408 THE FLOW OF GENETIC INFORMATION FROM DNA TO RNA TO PROTEIN Copyright 2009 Pearson Education, Inc. Genes are expressed as proteins A gene is a segment of DNA that
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationThe Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 71 74. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.235 Conference Review The Gene Ontology Annotation
More informationThe Central Dogma. DNA makes RNA makes Proteins
The Central Dogma DNA makes RNA makes Proteins TRANSCRIPTION DNA RNA transcript RNA polymerase RNA PROCESSING Exon RNA transcript (pre-) Intron Aminoacyl-tRNA synthetase NUCLEUS CYTOPLASM FORMATION OF
More informationCenter for Mass Spectrometry and Proteomics Phone (612) (612)
Outline Database search types Peptide Mass Fingerprint (PMF) Precursor mass-based Sequence tag Results comparison across programs Manual inspection of results Terminology Mass tolerance MS/MS search FASTA
More informationBioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics
The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the
More informationIV107 Bioinformatika I
IV107 Bioinformatika I Přednáška 5 Katedra informačních technologií Masarykova Univerzita Brno Jaro 2011 Předchozí týden Struktura genu prokaryotického eukaryotického Porovnání sekvencí globální (Needleman
More informationEnsembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More information36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-
36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L- 37. The essential fatty acids are A. palmitic acid B. linoleic acid C. linolenic
More informationComputational Molecular Biology Intro. Alexander (Sacha) Gultyaev
Computational Molecular Biology Intro Alexander (Sacha) Gultyaev a.p.goultiaev@liacs.leidenuniv.nl Biopolymer sequences DNA: double-helical nucleic acid. Monomers: nucleotides C, A, T, G. RNA: (single-stranded)
More informationKey Area 1.3: Gene Expression
Key Area 1.3: Gene Expression RNA There is a second type of nucleic acid in the cell, called RNA. RNA plays a vital role in the production of protein from the code in the DNA. What is gene expression?
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationBasic concepts of molecular biology
Basic concepts of molecular biology Gabriella Trucco Email: gabriella.trucco@unimi.it Life The main actors in the chemistry of life are molecules called proteins nucleic acids Proteins: many different
More informationNUCLEIC ACIDS. DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides.
NUCLEIC ACIDS DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides. Base Adenine Guanine Cytosine Uracil Thymine Abbreviation A G C U T DNA RNA 2
More informationBioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012
Bioinformatics ONE Introduction to Biology Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012 Biology Review DNA RNA Proteins Central Dogma Transcription Translation
More informationGil Alterovitz Harvard-MIT Division of Health Science & Technology
Modern Biology in Two Lectures (Part II) Gil Alterovitz Course Administration andouts Open Courseware form- please turn in before leaving class Matlab form- for free copy of Matlab for students in class
More informationDatabases in Bioinformatics. Molecular Databases. Molecular Databases. NCBI Databases. BINF 630: Bioinformatics Methods
Databases in Bioinformatics BINF 630: Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu Molecular Databases Molecular Databases Nucleic acid sequences: GenBank, DNA Data Bank of Japan, EMBL
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationGenome Informatics. Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, Kiyoko F. Aoki-Kinoshita
Genome Informatics Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, 2008 Kiyoko F. Aoki-Kinoshita Introduction Genome informatics covers the computer- based modeling and data processing
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationDNA makes RNA makes Proteins. The Central Dogma
DNA makes RNA makes Proteins The Central Dogma TRANSCRIPTION DNA RNA transcript RNA polymerase RNA PROCESSING Exon RNA transcript (pre-mrna) Intron Aminoacyl-tRNA synthetase NUCLEUS CYTOPLASM FORMATION
More informationHomology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen
Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,
More informationBME 110 Midterm Examination
BME 110 Midterm Examination May 10, 2011 Name: (please print) Directions: Please circle one answer for each question, unless the question specifies "circle all correct answers". You can use any resource
More informationKlinisk kemisk diagnostik BIOINFORMATICS
Klinisk kemisk diagnostik - 2017 BIOINFORMATICS What is bioinformatics? Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological,
More informationImportant gene-information's
Sequences, domains and databases. How to gather information on a gene. Jens Bohnekamp, Institute for Biochemistry Important gene-information's Protein sequence Nucleotide sequence Gene structure Protein
More informationLecture 2 Introduction to Data Formats
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 2 Introduction to Data Formats Introduction to Data Formats Real world, data and formats Sequences and
More informationEE550 Computational Biology
EE550 Computational Biology Week 1 Course Notes Instructor: Bilge Karaçalı, PhD Syllabus Schedule : Thursday 13:30, 14:30, 15:30 Text : Paul G. Higgs, Teresa K. Attwood, Bioinformatics and Molecular Evolution,
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationBIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology
BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology Jeremy Buhler March 15, 2004 In this lab, we ll annotate an interesting piece of the D. melanogaster genome. Along the way, you ll get
More informationProblem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.
Problem Set Unit 3 Name 1. Which molecule is found in both DNA and RNA? A. Ribose B. Uracil C. Phosphate D. Amino acid 2. Which molecules form the nucleotide marked in the diagram? A. phosphate, deoxyribose
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools
CAP 5510: Introduction to Bioinformatics : Bioinformatics Tools ECS 254A / EC 2474; Phone x3748; Email: giri@cis.fiu.edu My Homepage: http://www.cs.fiu.edu/~giri http://www.cs.fiu.edu/~giri/teach/bioinfs15.html
More informationGenome annotation & EST
Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary
More informationAmino Acids and Proteins
Various Functions of Proteins SB203 Amino Acids and Proteins Jirundon Yuvaniyama, Ph.D. Department of Biochemistry Faculty of Science Mahidol University Enzymes Transport proteins utrient and storage proteins
More informationSequence Analysis. Introduction to Bioinformatics BIMMS December 2015
Sequence Analysis Introduction to Bioinformatics BIMMS December 2015 abriel Teku Department of Experimental Medical Science Faculty of Medicine Lund University Sequence analysis Part 1 Sequence analysis:
More informationDina El-Khishin (Ph.D.) Bioinformatics Research Facility. Deputy Director of AGERI & Head of the Genomics, Proteomics &
Dina El-Khishin (Ph.D.) Deputy Director of AGERI & Head of the Genomics, Proteomics & Bioinformatics Research Facility Agricultural Genetic Engineering Research Institute (AGERI) Giza EGYPT Bioinformatics
More informationDescription of Changes and Corrections for PDB File Format Version 4.0. Provisional Document April 12, 2011
Description of Changes and Corrections for PDB File Format Version 4.0 Provisional Document April 12, 2011 The wwpdb has reviewed the PDB archive and created a new set of corrected files that will be released
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More information