1. Proteomics database contents Protein sequence databases
|
|
- Pierce Kelly
- 6 years ago
- Views:
Transcription
1 1. Proteomics contents Protein sequence s Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support ProteoRed Proteomics Facility, National Center for Biotechnology, Madrid
2 Menu Introduction : bioinformatics and sequence s Nucleic acid sequence s Protein sequences s (sources) Protein sequences s (other)
3 Biology of the XXI century Three major developments: High throughput technique analysis: DNA sequencing, mass spectrometry, micro- Numerous biological s available through the Web Bioinformatics tools available through the Web
4 An overwhelming number of unordered resources
5 Protein Sequence 3 o Structure Protein 2D PAGE & MS PTM Protein identification & characterization PTM Prediction tool 1 o Structure Analysis 3 o Structure Prediction Nucleotide Amino Acid Translator Sequence Alignment Similarity Search Gene Expression Protein Interactions Species / Genomic Functional 2 o Structure Prediction Subcellular localization Polymorphism / Mutation / Disease databae Topology Prediction Pattern & Profile search Domains & classification 2 o Structure Database Database Database Database Database Database Database Database Database Database Database Database Database Phylogenetics & Taxonomy References / nomenclatur e Nucleotide sequence repository
6 References / nomenclatur e Phylogenetics & Taxonomy Subcellular localization Protein Sequence 3 o Structure Protein 2D PAGE & MS PTM Protein identification & characterization PTM Prediction tool 1 o Structure Analysis 3 o Structure Prediction Nucleotide Amino Acid Translator Sequence Alignment Similarity Search Gene Expression Protein Interactions Species / Genomic Functional 2 o Structure Prediction Polymorphism / Mutation / Disease databae Topology Prediction Pattern & Profile search Domains & classification 2 o Structure Database Database Database Database Database Database Database Database Database Database Database Database Database Nucleotide sequence repository UniProtKB (Swiss-Prot/TrEMBL) TargetP EcoGene Ensembl FlyBase MGD SGD SubtiList TIGR CMR HIV TAIR MEROPS ENZYME TRANSFAC KEGG HAMAP PROSITE InterPro Pfam ProDom BLOCKS TIGRFAM ProtoMap CATH SCOP PDB SWISS-MODEL ScanProsite MotifScan HSSP Jpred GOR DIP IntAct ProtScale ProtParam BLAST FASTA dbsnp GeneCards OMIM CleanEx DDBJ GenBank EMBL TreeBase NEWT Taxonomy PSORT Glycosuite PhosphBase NetOGlyc ChloroP PeptideMass Mascot Phenyx ECO2DBASE Siena-2D PAGE SWISS-2D PAGE TMHMM SOSUI PubMed HUGO GO ClustalW DIALIGN Translate
7 Molecular bioinformatics: an operational definition The applications of computer sciences to molecular biology in particular for the study of macromolecules such as proteins, nucleic acids and oligosaccharides
8 Protein sequence s - Identification of proteins by proteomics --> completeness, sequence quality - Similarity searches (functional prediction) --> sequence quality (non redundance) - Training datasets (prediction tools) --> sequence and annotation quality - Genome annotation
9 Proteome complexity Not predictable at the genome level! (Jensen O.N., Curr. Opin. Chem. Biol., 2004, 8, 33-41, PMID: ).
10 Avalanche of sequence data
11
12 ~ 1630 genomes sequenced (single organism, varying sizes) ~ 952 ongoing genome sequencing projects
13
14
15 ~ 1630 genomes sequenced (single organism, varying sizes) ~ 952 ongoing genome sequencing projects. ~ 200 metagenome sequencing projects (environmental samples: multiple unknown organisms, varying sizes) Ecological metagenomes: beach sand, Sargasso Sea. Organismal metagenomes: mouse gut ~ 17 million sequences being processed at Venter Institute
16 How many protein sequences at the end? For fun: estimate: ~30 million species (1.5 million named) 20 million bacteria/archea x 4'000 genes ( ) 5 million protists x 6'000 genes 3 million insects x 14'000 genes 1 million fungi x 6'000 genes 0.6 million plants x 20'000 genes 0.2 million molluscs, worms, arachnids, etc. x 20'000 genes 0.2 million vertebrates x 25'000 genes The calculation: 2x10 7 x4000+5x10 6 x6000+3x10 6 x x6000+6x10 5 x x 10 5 x x10 5 x25000 = 179'000'000'000 AMB, SP20
17 Protein sequence origin About 4.5 millions of known protein sequences (in 2007) More than 99 % of the protein sequences are derived from the translation of nucleotide sequences Less than 1 %: direct protein sequencing (Edman, MS/MS ) -> It is important that users know where the protein sequence comes from (sequencing & gene prediction quality)!
18 Menu Introduction : bioinformatics and sequences Nucleic acid sequence s Protein sequences s (sources) Protein sequences s (other)
19 The hectic life of a sequence Data not submitted to public s*, delayed or cancelled cdnas, ESTs(expressed sequence tags), genes, genomes, EMBL, GenBank, DDBJ EMBL: GenBank: DDBJ:
20 Contribution: EMBL 10 %; GenBank 75 %; DDBJ 15 %
21 Goal -to accept, process and make freely available sequence data from individual researchers, research group and patent office - available via SRS/Entrez, ftp, web services and similarity search tools.
22 The tremendous increase in nucleotide sequences 1980: 80 genes fully sequenced!
23 EMBL/GenBank/DDBJ Serve as archives : nothing goes out Contain all public sequences derived from: Genome projects (> 80 % of entries) Sequencing centers (cdnas, ESTs ) Individual scientists ( 15 % of entries) Patent offices (i.e. European Patent Office, EPO) Currently: ~152x10 6 sequences, ~242 x10 9 bp; Sequences from > different species;
24 More than species, but human mouse rat Human/Mouse/Rat: organisms with the highest redundancy!
25 Where the sequenced specimen was collected? Geographical Origin of Sequenced Samples (since 2005) (lat_lon: latitude_longitude qualifier)
26 EMBL/GenBank/DDBJ A very important annotation for proteomic: the CoDing Sequence (CDS) (in particular for eucaryotes)
27 with or without annotated CDS provided by authors Data not submitted to public s*, delayed or cancelled cdnas, ESTs, genes, genomes, EMBL, GenBank, DDBJ CDS CoDing Sequence portion of DNA/RNA translated into protein (from Met to STOP) Experimentally proved or derived from gene prediction
28 5 Problems
29 Problem 1 Complete genome (submitted) only ~ 2,015 CDS available!
30 At the nucleic acid level human mouse rat At the protein level At the protein level (Example with UniProtKB/TrEMBL): The CDS of virus and bacteria are easy to obtain!
31 Problem 2: Variable level of sequence quality - Sequencing quality - Gene prediction quality Authors can specify the nature of the CDS by using the qualifier: "/evidence=experimental" or "/evidence=not_experimental". Very rarely done
32 Very rarely done
33 UniProtKB/Swiss-Prot protein knowledgebase release 56.6 statistics (16-Dec-08) Protein existence (PE): % 1: At protein level 15,3% 2: Evidence at transcript level 15,8% 3: Inferred from homology 65,2% 4: Predicted 3,4% 5: Uncertain 0,3%
34 Problem 3: highly redundant Sort of sequence museum, where sequences are preserved for eternity as they were determined, interpreted and published originally by their authors (primary sequence repository) -> Similarity searches are not obvious
35 Problem no 4 Author authority --> variable level of the annotation (CDS and other) quality - i.e. gene/protein name attribution
36 EMBL/GenBank/DDBJ The authors have full authority over the content of the entries they submit! (editorial control of the content belongs to the authors) (exception: TPA (Third Party Annotation), since january 2003)
37 Problem no 5 Environmental samples
38 Environmental sequences (ENV) Aim: To sequence all DNA present in a given sample, without knowing from which species the DNA is derived from - Sargasso sea (Craig Venter) - human fluids - earth
39
40
41 No idea of the species (microbial population ) No idea of the gene prediction program to be used No idea of the genetic code to be used for traduction!!!!! Not always associated with CDS. If yes, the protein sequence are present in protein sequence s
42 Menu Introduction : bioinformatics and sequences Nucleic acid sequence s Protein sequences s (sources) Protein sequences s (other)
43 Data not submitted to public s, delayed or cancelled cdnas, ESTs, genomes, Nucleic acid s no CDS EMBL, GenBank, DDBJ if the submitters provide an annotated Coding Sequence (CDS) (1/10 EMBL entries) Gene prediction Protein sequence s
44 Major protein sequence sources PIR PDB PRF UniProtKB: Swiss-Prot + TrEMBL Integrated resources cross-references Separated resources NCBI-nr: Swiss-Prot + GenPept + PIR + PDB + PRF + RefSeq UniProtKB/Swiss-Prot: manually annotated protein sequences ( species) UniProtKB/TrEMBL: submitted CDS (EMBL) + automated annotation; non redundant with Swiss-Prot ( species) GenPept: submitted CDS (GenBank); redundant with Swiss-Prot ( species) PIR: Protein Information Ressource; archive since 2003; integrated into UniProtKB PDB: Protein Databank: 3D data and associated sequences PRF: journal scan of published peptide sequences RefSeq: Reference Sequence for DNA, RNA, protein + gene prediction (4 000 species)
45 UniProt, the Universal protein resource is maintained by the UniProt consortium SIB + EBI + PIR SIB = Swiss Institute Bioinformatics EBI = European Bioinformatics Institute PIR = Protein Information Resource
46 entries ( species) entries ( species)
47
48 The UniProt KnowledgeBase (UniProtKB) an encyclopedia on proteins biweekly released
49 EMBL TrEMBL Automated extraction of protein sequence (translated CDS), gene name and references.+ Automated annotation
50 !!!! The quality of UniProtKB/TrEMBL data, including the protein sequence, is directly dependent on the information provided by the submitter of the original nucleotide entry. Automated annotation using rules derived from Swiss-Prot manually annotated entries but with no manual oversight RuleBase using automatically generated rules - Spearmint
51 EMBL TrEMBL Manual annotation of the sequence and associated biological information Swiss-Prot Automated extraction of protein sequence (translated CDS), gene name and references.+ Automated annotation
52 UniProtKB from TrEMBL to Swiss-Prot Sequence check
53 UniProtKB/Swiss-Prot 1 entry <-> 1 gene (1 species) i) Merge of all known protein sequences (CDS) derived from the same gene -> avoid redundancy and improve sequence reliability (for human: ~ 6 different sequence report per entry) ii) Annotation of the sequence differences (including conflicts, polymorphisms, splice variants etc..) -> annotation of protein diversity
54 Righting the wrongs Sequences are rarely deposited in a mature state; as with all scientific research, DNA and protein annotation is a continual process of learning, revision and corrections. Sequencing error rates: ~1 base in
55 evidence exists that prove the existence of a protein; Different qualifiers: 1. Evidence at protein level (~15,3%) 2. Evidence at transcript level (~15,8%) 3. Inferred from homology (~65,2 %) 4. Predicted (~3,4%) 5. Unassigned (mainly in TrEMBL) (0,3%)
56 Annotation Focal point of our efforts to maintain and develop UniProtKB/Swiss-Prot; Enables individual researchers to obtain a summary of what is known about a protein
57 In a UniProtKB/Swiss-Prot entry, you can expect to find: A (often corrected) protein sequence and the description of various isoforms/variants. Its biological origin with links to the taxonomic s; All the names of a given protein (and of its gene); A summary of what is known about the protein: function, alternative products, PTM, tissue expression, disease, 3D data etc. ; A description of important sequence features: domains, PTMs, variations, etc.; A selection of references; Selected keywords; Numerous cross-references (central hub);
58 An easy way to access the history of a protein sequence entry UniSave homepage:
59
60
61 Other UniProt s
62
63 UniRef
64 UniRef useful for comprehensive BLAST searches by providing sets of representative sequences «Collapsing BLAST results» = Three collections of sequences clusters from the UniProt knowledgebase and EnsEMBL, IPI, EMBL_WGS: One UniRef100 entry -> all identical sequences (Identical sequences and sub-fragments with 11 or more residues are placed into a single record) -> reduction of 12 % One UniRef90 entry -> sequences that have at least 90 % or more identity -> reduction of 40 % One UniRef50 entry -> sequences that are at least 50 % identical -> reduction of 65 % Independently of the species!
65 UniParc
66 UniParc
67 UniParc UniProt Archive (UniParc) is part of UniProt project. It is a non-redundant archive of protein sequences extracted from public s UniProtKB/Swiss-Prot,UniProtKB/TrEMBL, PIR-PSD, EMBL, EMBL WGS, Ensembl, IPI, PDB, PIR-PSD,RefSeq, FlyBase, WormBase, H-Invitational Database, TROME, European Patent Office proteins, United States Patent and Trademark Office proteins (USPTO) and Japan Patent Office proteins. UniParc contains only protein sequences. All other information about the protein must be retrieved from the source s using the cross-references. Each unique sequence is stored only once with a stable identifier. The format of the identifier is UPI followed by ten hexadecimal numbers, e.g.upi a.
68 UniParc Use with extreme caution: also contains pseudogene, incorrect CDS prediction etc! Also patent office data (EPO, ESPO ).
69 Not downloadable
70 UniMES
71 The UniProt Metagenomic and Environmental Sequences (UniMES) is a repository specifically developed for metagenomic and environmental data. UniMES is available in FASTA format on the UniProt ftp servers, in the new subdirectory current_release/unimes: ftp.uniprot.org/pub/s/uniprot ftp.ebi.ac.uk/pub/s/uniprot ftp.expasy.org/s/uniprot
72
73 NCBInr (Entrez protein)
74 Protein sequences: «NR» Entrez protein
75 Major protein sequence sources PIR PDB PRF UniProtKB: Swiss-Prot + TrEMBL Integrated resources cross-references Separated resources NCBI-nr: Swiss-Prot + GenPept + PIR + PDB + PRF + RefSeq UniProtKB/Swiss-Prot: manually annotated protein sequences ( species) UniProtKB/TrEMBL: submitted CDS (EMBL) + automated annotation; non redundant with Swiss-Prot ( species) GenPept: submitted CDS (GenBank); redundant with Swiss-Prot ( species) PIR: Protein Information Ressource; archive since 2003; integrated into UniProtKB PDB: Protein Databank: 3D data and associated sequences PRF: journal scan of published peptide sequences RefSeq: Reference Sequence for DNA, RNA, protein + gene prediction (4 000 species)
76 Scientific publications derived sequences «Journal scan» (integrated into TrEMBL) NCBI-nr: Swiss-Prot + GenPept + (PIR) + RefSeq + PDB + PRF All PIR data have been integrated into Swiss-Prot and TrEMBL (UniProt) derived from GenBank/EMBL/DDBJ sequences which have a CDS annotated on them - equivalent to TrEMBL 3D structure : all the protein sequences which have been cristallized (Swiss-Prot/TrEMBL are crosslinked to PDB)
77 RefSeq
78
79 RefSeq: The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. 3,648,590 entries (22-May-2007); 4,300 species. 5,590,364 entries (11-July-2008); 5,395 species. 6,042,750 entries (20-November-2008); 5,726 species. Accession numbers - for RNA (NM_) - for genomic (NT_) - for protein (NP_) - for predicted protein (XP_)
80 AC
81 AC KW Taxonomy References
82 Scientific publications derived sequences «Journal scan» (integrated into TrEMBL) NCBI-nr: Swiss-Prot + GenPept + (PIR) + RefSeq + PDB + PRF All PIR data have been integrated into Swiss-Prot and TrEMBL (UniProt) derived from GenBank/EMBL/DDBJ sequences which have a CDS annotated on them - equivalent to TrEMBL, except that it is redundant with Swiss-Prot 3D structure : all the protein sequences which have been cristallized (Swiss-Prot/TrEMBL are crosslinked to PDB)
83 PIR
84 PIR: the Protein Identification Resource PIR-PSD is no more updated, but exists as an archive
85 PDB
86 PDB PDB (Protein Data Bank), 3D structure Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X- ray or NMR studies Contains also the corresponding protein sequences *The PIR-NRL3D makes the sequence information in PDB available for similarity searches and other tools Includes protein sequences which are mutated, effect of a mutation on the 3D structure)
87 PDB: Protein Data Bank Managed by Research Collaboratory for Structural Bioinformatics (RCSB) (USA). Associated with specialized programs allow the visualization of the corresponding 3D structure (e.g., SwissPDB-viewer, Chime, Rasmol)). Currently there are structural data for about different proteins, but far less protein family (highly redundant)!
88 PDB: example
89 Sequence Coordinates of each atom
90 Visualisation with Jmol
91 PRF
92 Looks for the peptide sequence described in publication (and which are not submitted in s!!!)
93
94 Query at Entrez protein (NCBInr)
95 RefSeq Typical result of a query at «Entrez protein» Genpept (gb/embl/ddbj) PIR Swiss-Prot PDB
96 AC GenInfo identifier number
97 GI number: GenInfo identifier number - In addition to an AC number specific from the original, each protein sequence in the NCBInr has a GI number. - If the sequence changes in any way, a new GI number will be assigned -> not a stable identifier - A separate GI number is also assigned to each protein translation within a nucleotide sequence record (alternative products) - A Sequence Revision History tool is available to track the various GI numbers, version numbers, and update dates for sequences that appeared in a specific GenBank record:
98 Menu Introduction : bioinformatics and sequences Nucleic acid sequence s Protein sequences s (sources) Protein sequences s (other)
99 EnsEMBL not only for proteins.
100 EnsEMBL Automated genome annotation and subsequent visualisation of annotated genomes. Ensembl concentrates on vertebrate genomes, but other groups have adapted the system for use with plant and fungal genomes.
101 - EnsEMBL: align the genomic sequences with all the sequences found in EMBL, UniProtKB/Swiss-Prot, RefSeq and UniProtKB/TrEMBL (-> known genes) - Also do gene prediction (-> novel genes) -DNA, RNA and protein sequences available for ~30 species - Browsing tool
102
103 Browsing tool available for 49 species
104
105 CCDS Consensus CDS protein set
106
107 CCDS (human) Combining different approaches ab initio, by similarity - and taking advantage of the expertise acquired by different institutes, including manual annotation Consensus between 4 institutions
108
109 IPI International Protein Index
110
111 IPI (International Protein Index) Provides a guide to the main s that describe the human, mouse, rat, Zebrafish, Arabidopsis, Chicken, and Cow proteomes: Swiss-Prot, TrEMBL, RefSeq and Ensembl (and H- InvDB, TAIR and VEGA). IPI is built in order to provide maximum coverage of the major publicly available protein (and gene) s, for a same protein For each protein in IPI, an entry from one of the constituent s is selected as the master entry, and supplies the IPI entry with its sequence and annotation. Stable identifiers (with incremental versioning) are maintained to allow the tracking of sequences in IPI between IPI releases.
112
113 IMGT (international ImMunoGeneTics information) Is a collection of high-quality integrated s specialising in inmunoglobulins, T cell receptors and the Major Histocompatibility Complex (MHC) of all vertebrate species.
114
115
116 Protein sequence s for proteomics
117 Phenyx: UniProtKB Translation of ESTs sequences in the 6 frames (EST are not associated with annotated CDSs!) PROWL: NCBInr, Swiss-Prot, dbest Protein prospector: NCBInr, Swiss-Prot, dbest, GenPept, Ludwignr, OWL*. Peptident (Aldente): UniProtKB. Mascot: NCBInr, Swiss-Prot, dbest, OWL*, MSDB * OWL is obsolete since 1999
118 OWL Non redundant protein, including: Swiss-Prot, PIR, NRL3-D* and GenPept. *The PIR-NRL3D makes the sequence information in PDB available for similarity searches
119 Phenyx: UniProtKB Translation of ESTs sequences in the 6 frames (EST are not associated with annotated CDSs!) PROWL: NCBInr, Swiss-Prot, dbest Protein prospector: NCBInr, Swiss-Prot, dbest, GenPept, Ludwignr, OWL*. Peptident (Aldente): UniProtKB. Mascot: NCBInr, Swiss-Prot, dbest, OWL*, MSDB * OWL is obsolete since 1999
120
121 ID/AC mapping
122 -> Accession / version number jungle! According to the, a AC number can be associated with an entry (gene product: stable even if the sequence changes) or with a sequence (it change as soon as the sequence changes)
123 In resume For the same protein sequence You can find: A UniProtKB/Swiss-Prot entry A RefSeq entry (or GenPept) A EnsEMBl entry A CCDS entry A UniParc entry (archive) A IPI
124 The AC number jungle Type of record Sample Accession Format GenBank/EMBL/DDBJ Swiss-Prot/TrEMBL RefSeq nucleotide RefSeq protein RefSeq prediction PDB (protein structure) One letter followed by five digits: e.g. U12345 Two letters followed by 6 digits: e.g. AF One letter and five digits/letters: e.g. P12345, A0B533 Two letters, underscore bar and six digit: e.g. mrna NM_ e.g. genomic NT_ e.g. NP_00483 e.g. XM_ e.g. XP_ One digit followed by three letters: e.g. 1TUP
125 uniprot.org
126
127 UniProtKB and PTMs
128 Proteome complexity Not predictable at the genome level! (Jensen O.N., Curr. Opin. Chem. Biol., 2004, 8, 33-41, PMID: ).
129 Chemical aspects Post-translational modifications (PTMs) consist in the breaking and/or the making of covalent bonds catalyzed by enzyme PTMs modify both protein mass and isoelectric point (PI)
130 The PTM variety Gly Ala Val Leu Ile Lys Arg His Asp Glu Asn Gln Cys Ser Thr Met Pro Phe Tyr Trp side-chain modifications acetylation methylation acylation phosphorylation oxidation crosslinks hydroxylation cofactor binding sulfation C-linked sugar N-linked sugar O-linked sugar S-linked sugar N-terminal modifications acetylation methylation acylation crosslinks C-terminal modifications GPI amidation crosslinks methylation in black: cytoplasmic modifications in dark grey: both cytoplasmic and extracellular modifications, depending on the exact type in light grey: extracellular modifications
131 PTM distribution among kingdoms FMN binding bacterial lipid anchor pyrrolysine archaea bacteria-specific methylation lanthionine crosslink archaea-specific methylation bacteria acetylation archaean lipid anchor phosphorylation myristoylation methylation FAD binding diphthamide palmitoylation GPI-anchor amidation sulfation eukaryote-specific methylation eukaryotes
132 PTM annotation in UniProtKB entries PTMs are annotated in the feature table ( sequence annotation ) when they can be assigned a position on the protein sequence - in the comments when they cannot.
133 PTM-dedicated FT keys FT key usage CARBOHYD (Glycosylation ) DISULFID (Disulfide bond) CROSSLNK (Cross-link) LIPID MOD_RES (Modified residue) sugars disulfide bonds other crosslinks lipids other modifications PTMs are grouped by type, are specifically and uniquely annotated by the use of a controlled vocabulary and a set of specific FT keys
134 PTM annotation in UniProtKB entries PTMs are annotated in the feature table when they can be assigned a position on the protein sequence - in the comments when they cannot. Associated keywords
135
136 Find all mouse proteins which are phosphorylated
137
138 UniProtKB/Swiss-Prot Number of PTMs in Swiss-Prot release 51 ( entries) all organisms Pot. By sim. Exp. & Prob. total signal peptide N-GlcNAc O-GalNAc O-GlcNAc phosphorylation sulfation myristate GPI-anchor
139 Resid
140 RESID RESID is a of 473 natural modifications (Rel ) with chemical and structural annotations such as recommended name and synonyms, delta mass, 3D structure, UniProt annotations, etc. FTP sites: ftp://ftp.ebi.ac.uk/pub/s/resid/ ftp://ftp.ncifcrf.gov/pub/users/residues Web sites:
141 RESID
142 RESID
143 Other PTM s UNIMOD: PSI-MOD: ontology Delta Mass:
144 GO
145
146 GO scope Three disjoint axes: cellular component Sub-cellular location e.g nucleus, ribosome, origin recognition complex molecular function molecular role e.g. catalytic activity, binding biological process broad biological phenomena e.g. mitosis, growth, digestion
147 GO structure terms are related within a hierarchy Terms are linked by two relationships is-a part-of
148 GO structure cell is-a part-of membrane chloroplast mitochondrial membrane chloroplast membrane
149 GOA: Gene Ontology Annotation What is GOA? GOA aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase and International Protein Index, using GO terms. The GOA project is run by EBI and is a member of the GO consortium since In 2001, the first phase of the GOA project involved the large-scale assignment of GO terms to Swiss-Prot and TrEMBL entries using electronic methods, namely the mappings spkw2go, ec2go and Interpro2go.
150
151 e-proxemis:
152
Sequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationSince 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL
Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL PIR-PSD Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database o A high quality
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SLU 2017 Biological Databases Sequence Databases Genome Databases Structure Databases Sequence Databases The sequence databases are the
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SGBC-SLU 2016 VALIDATION Experimental Literature Manual or semi-automatic computational analysis EXPERIMENTAL Costs Needs skilled manpower
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationONLINE BIOINFORMATICS RESOURCES
Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower
More informationThe Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 71 74. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.235 Conference Review The Gene Ontology Annotation
More informationGenome Informatics. Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, Kiyoko F. Aoki-Kinoshita
Genome Informatics Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, 2008 Kiyoko F. Aoki-Kinoshita Introduction Genome informatics covers the computer- based modeling and data processing
More informationWeb-based Bioinformatics Applications in Proteomics
Web-based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu January 30, 2009 NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ 1 Pubmed
More informationNiceProt View of Swiss-Prot: P18907
Hosted by NCSC US ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot Mirror sites: Australia Bolivia Canada China Korea Switzerland Taiwan Search Swiss-Prot/TrEMBL for horse alpha Go Clear NiceProt
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationWeb based Bioinformatics Applications in Proteomics. Genbank
Web based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu February 9, 2010 Genbank Primary nucleic acid sequence database Maintained by NCBI National Center for Biotechnology
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationBioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases
Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras Lecture - 5a Protein sequence databases In this lecture, we will mainly discuss on Protein Sequence
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationFACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE
FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationWill discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice
Will discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice integration - web system (V) 1 Touring the Protein Space (outline) 1. Protein Sequence - how rich? How
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationThis practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.
PRACTICAL 1: BLAST and Sequence Alignment The EBI and NCBI websites, two of the most widely used life science web portals are introduced along with some of the principal databases: the NCBI Protein database,
More informationIntroduction to Bioinformatics. What are the goals of the course? Who is taking this course? Textbook. Web sites. Literature references
Introduction to Bioinformatics Who is taking this course? People with very diverse backgrounds in biology Some people with backgrounds in computer science and biostatistics Most people (will) have a favorite
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationAn Introduction to Bioinformatics for Biological Sciences Students
An Introduction to Bioinformatics for Biological Sciences Students Department of Microbiology and Immunology, McGill University Version 2.5 (For the BIOC-300 lab), March 2006 2 AN INTRODUCTION TO BIOINFORMATICS
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationMS bioinformatics analysis for proteomics. Protein anotations
MS bioinformatics analysis for proteomics Protein anotations UCO - Córdoba Organized by: ProteoRed, EUPA and Seprot Alberto Medina January, 23rd 2009 Summary Introduction Some issues Software: Fatigo -
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationBioinformatics for Cell Biologists
Bioinformatics for Cell Biologists 15 19 March 2010 Developmental Biology and Regnerative Medicine (DBRM) Schedule Monday, March 15 09.00 11.00 Introduction to course and Bioinformatics (L1) D224 Helena
More informationBasic concepts of molecular biology
Basic concepts of molecular biology Gabriella Trucco Email: gabriella.trucco@unimi.it Life The main actors in the chemistry of life are molecules called proteins nucleic acids Proteins: many different
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationENZYMES AND METABOLIC PATHWAYS
ENZYMES AND METABOLIC PATHWAYS This document is licensed under the Attribution-NonCommercial-ShareAlike 2.5 Italy license, available at http://creativecommons.org/licenses/by-nc-sa/2.5/it/ 1. Enzymes build
More informationEBI web resources I: databases and tools. Yanbin Yin Spring 2013
EBI web resources I: databases and tools Yanbin Yin Spring 2013 1 Outline Intro to EBI Databases and web tools UniProt Gene Ontology Hands on PracBce MOST MATERIALS ARE FROM: hkp://www.ebi.ac.uk/training/online/course-
More informationBioinformatics Practical Course. 80 Practical Hours
Bioinformatics Practical Course 80 Practical Hours Course Description: This course presents major ideas and techniques for auxiliary bioinformatics and the advanced applications. Points included incorporate
More informationEE550 Computational Biology
EE550 Computational Biology Week 1 Course Notes Instructor: Bilge Karaçalı, PhD Syllabus Schedule : Thursday 13:30, 14:30, 15:30 Text : Paul G. Higgs, Teresa K. Attwood, Bioinformatics and Molecular Evolution,
More informationSequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases
Chapter 2 Paul Rangel Abstract DNA and Protein sequence databases are the cornerstone of bioinformatics research. DNA databases such as GenBank and EMBL accept genome data from sequencing projects around
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationDatabases in genomics
Databases in genomics Search in biological databases: The most common task of molecular biologist researcher, to answer to the following ques7ons:! Are they new sequences deposited in biological databases
More informationProteomics databases
Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch Part I Proteomics databases Proteomics databases 1. Sequence databases: «The story of a protein sequence s life»
More informationWeb-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.
Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September
More informationProblem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.
Problem Set Unit 3 Name 1. Which molecule is found in both DNA and RNA? A. Ribose B. Uracil C. Phosphate D. Amino acid 2. Which molecules form the nucleotide marked in the diagram? A. phosphate, deoxyribose
More informationWhat is a database? biological databases. An introduction to. A collection of. Includes also associated tools (software) data
An introduction to biological databases Marie-Claude.Blatter@isb-sib.ch A collection of What is a database? structured searchable (index) -> table of contents updated periodically (release) -> new edition
More informationBasic concepts of molecular biology
Basic concepts of molecular biology Gabriella Trucco Email: gabriella.trucco@unimi.it What is life made of? 1665: Robert Hooke discovered that organisms are composed of individual compartments called cells
More informationPROTEOINFORMATICS OVERVIEW
PROTEOINFORMATICS OVERVIEW August 11th 2016 Pratik Jagtap Center for Mass Spectrometry and Proteomics http://www.cbs.umn.edu/msp Outline PROTEOMICS WORKFLOW PEAKLIST PROCESSING Search Databases Overview
More informationNCBI Molecular Biology Resources. Entrez & BLAST. Entrez: Database Integration. Database Searching with Entrez. WWW Access. Using Entrez.
NCBI Molecular Biology Resources Using Entrez WWW Access Entrez & BLAST March 2007 Phylogeny Entrez: Database Integration Taxonomy PubMed abstracts Genomes Word weight 3-D Structure VAST Neighbors Related
More informationCenter for Mass Spectrometry and Proteomics Phone (612) (612)
Outline Database search types Peptide Mass Fingerprint (PMF) Precursor mass-based Sequence tag Results comparison across programs Manual inspection of results Terminology Mass tolerance MS/MS search FASTA
More informationFrom assembled genome to annotated genome
From assembled genome to annotated genome Procaryotic genomes Eucaryotic genomes Genome annotation servers (web based) 1. RAST 2. NCBI Gene prediction pipeline: Maker Function annotation pipeline: Blast2GO
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationAccess to Information from Molecular Biology and Genome Research
Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is
More informationBioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?
Bioinformatic Tools So you acquired data.. But you wanted knowledge So Now What? We have a series of questions What the Heck is That Ion? How come my MW does not match? How do I make a DB to search against?
More informationRetrieval of gene information at NCBI
Retrieval of gene information at NCBI Some notes 1. http://www.cs.ucf.edu/~xiaoman/fall/ 2. Slides are for presenting the main paper, should minimize the copy and paste from the paper, should write in
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationZool 3200: Cell Biology Exam 3 3/6/15
Name: Trask Zool 3200: Cell Biology Exam 3 3/6/15 Answer each of the following questions in the space provided; circle the correct answer or answers for each multiple choice question and circle either
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 8
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7
More informationglycosylphosphatidylinositol (GPI)- anchor;
Claire O Donovan is the large-scale annotation coordinator and is responsible for the TrEMBL database production at the EMBL Outstation EBI. Maria Jesus Martin coordinates software development and is responsible
More informationAlgorithms in Bioinformatics ONE Transcription Translation
Algorithms in Bioinformatics ONE Transcription Translation Sami Khuri Department of Computer Science San José State University sami.khuri@sjsu.edu Biology Review DNA RNA Proteins Central Dogma Transcription
More informationImportant gene-information's
Sequences, domains and databases. How to gather information on a gene. Jens Bohnekamp, Institute for Biochemistry Important gene-information's Protein sequence Nucleotide sequence Gene structure Protein
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org
More informationWhat You NEED to Know
What You NEED to Know Major DNA Databases NCBI RefSeq EBI DDBJ Protein Structural Databases PDB SCOP CCDC Major Protein Sequence Databases UniprotKB Swissprot PIR TrEMBL Genpept Other Major Databases MIM
More informationBasic protein and peptide science for proteomics. Henrik Johansson
Basic protein and peptide science for proteomics Henrik Johansson Proteins are the main actors in the cell Membranes Transport and storage Chemical factories DNA Building proteins Structure Proteins mediate
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationRESEARCH METHODOLOGY, BIOSTATISTICS AND IPR
MB 401: RESEARCH METHODOLOGY, BIOSTATISTICS AND IPR Objectives: The overall aim of the course is to deepen knowledge regarding basic concepts of Biostatistics, the research process in occupational therapy
More informationIntroduc)on to Databases and Resources Biological Databases and Resources
Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs
More informationGenome Resources. Genome Resources. Maj Gen (R) Suhaib Ahmed, HI (M)
Maj Gen (R) Suhaib Ahmed, I (M) The human genome comprises DNA sequences mostly contained in the nucleus. A small portion is also present in the mitochondria. The nuclear DNA is present in chromosomes.
More informationDatabases/Resources on the web
Databases/Resources on the web Jon K. Lærdahl jonkl@medisin.uio.no A lot of biological databases available on the web... MetaBase, the database of biological databases (1801 entries) - h p://metadatabase.org
More informationDe novo sequencing in the identification of mass data. Wang Quanhui Liu Siqi Beijing Institute of Genomics, CAS
De novo sequencing in the identification of mass data Wang Quanhui Liu Siqi Beijing Institute of Genomics, CAS The difficulties in mass data analysis Although the techniques of genomic sequencing are being
More informationThis software/database/presentation is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part
This software/database/presentation is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the author's official duties as a United States Government
More informationProtein Structure Databases, cont. 11/09/05
11/9/05 Protein Structure Databases (continued) Prediction & Modeling Bioinformatics Seminars Nov 10 Thurs 3:40 Com S Seminar in 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas
More informationUnit 1. DNA and the Genome
Unit 1 DNA and the Genome Gene Expression Key Area 3 Vocabulary 1: Transcription Translation Phenotype RNA (mrna, trna, rrna) Codon Anticodon Ribosome RNA polymerase RNA splicing Introns Extrons Gene Expression
More informationLecture for Wednesday. Dr. Prince BIOL 1408
Lecture for Wednesday Dr. Prince BIOL 1408 THE FLOW OF GENETIC INFORMATION FROM DNA TO RNA TO PROTEIN Copyright 2009 Pearson Education, Inc. Genes are expressed as proteins A gene is a segment of DNA that
More informationGenome annotation & EST
Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics 260.602.01 September 1, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Teaching assistants Hugh Cahill (hugh@jhu.edu) Jennifer Turney (jturney@jhsph.edu) Meg Zupancic
More informationCompiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology
Bioinformatics Model Answers Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology Page 1 of 15 Previous years questions asked. 1. Describe the software used in bioinformatics 2. Name four
More informationBIMM 143: Introduction to Bioinformatics (Winter 2018)
BIMM 143: Introduction to Bioinformatics (Winter 2018) Course Instructor: Dr. Barry J. Grant ( bjgrant@ucsd.edu ) Course Website: https://bioboot.github.io/bimm143_w18/ DRAFT: 2017-12-02 (20:48:10 PST
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationBioinformatics for Molecular Biology
Bioinformatics for Molecular Biology Databases & Accessing data Today s Programme Biological databases Brief introduction What is UNIX? Why should you learn UNIX? Bioinformatics Core Facility Setting up
More informationNucleic acid and protein Flow of genetic information
Nucleic acid and protein Flow of genetic information References: Glick, BR and JJ Pasternak, 2003, Molecular Biotechnology: Principles and Applications of Recombinant DNA, ASM Press, Washington DC, pages.
More informationNCBI Molecular Biology Resources
NCBI Molecular Biology Resources Part 2: Using NCBI BLAST December 2009 Using BLAST Basics of using NCBI BLAST Using the new Interface Improved organism and filter options New Services Primer BLAST Align
More informationKlinisk kemisk diagnostik BIOINFORMATICS
Klinisk kemisk diagnostik - 2017 BIOINFORMATICS What is bioinformatics? Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological,
More informationBIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP
Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in
More informationAnnotation. (Chapter 8)
Annotation (Chapter 8) Genome annotation Genome annotation is the process of attaching biological information to sequences: identify elements on the genome attach biological information to elements store
More informationFUNCTIONAL ANNOTATION
FUNCTIONAL ANNOTATION Benjamin Hsieh Emily Rogers >prot_contig_1 MGYRVGINCFDTRLQADDYLLSSLPPTVTQDGKI IRPERVGDKWILNGKPVTLSYPKCSNYEQVKSGA YLGSMVLILFVVIYGFRLLINFLKDIGKVGA Jin Hee Kim Jasreet Hundal Pushkala
More informationEnsembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More informationDina El-Khishin (Ph.D.) Bioinformatics Research Facility. Deputy Director of AGERI & Head of the Genomics, Proteomics &
Dina El-Khishin (Ph.D.) Deputy Director of AGERI & Head of the Genomics, Proteomics & Bioinformatics Research Facility Agricultural Genetic Engineering Research Institute (AGERI) Giza EGYPT Bioinformatics
More informationRedundancy at GenBank => RefSeq. RefSeq vs GenBank. Databases, cont. Genome sequencing using a shotgun approach. Sequenced eukaryotic genomes
Databases, cont. Redundancy at GenBank => RefSeq http://www.ncbi.nlm.nih.gov/books/bv.fcg i?rid=handbook RefSeq vs GenBank Many sequences are represented more than once in GenBank 2003 RefSeq collection
More informationGREG GIBSON SPENCER V. MUSE
A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.
More informationLecture 2 Introduction to Data Formats
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 2 Introduction to Data Formats Introduction to Data Formats Real world, data and formats Sequences and
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationBioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012
Bioinformatics ONE Introduction to Biology Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012 Biology Review DNA RNA Proteins Central Dogma Transcription Translation
More information