What You NEED to Know
|
|
- Cory Jacob Bailey
- 5 years ago
- Views:
Transcription
1 What You NEED to Know Major DNA Databases NCBI RefSeq EBI DDBJ Protein Structural Databases PDB SCOP CCDC Major Protein Sequence Databases UniprotKB Swissprot PIR TrEMBL Genpept Other Major Databases MIM (OMIM) KEGG COG GO Major Protein Domain Databases Interpro Prosite Pfam General Databases/Browsers Taxonomy UCSC Ensembl Organism Specific
2 Organism specific databases
3
4
5
6
7 Large scale sequencing projects
8 While not large scale sequencing it is large scale genotyping
9
10
11
12
13 Databases NCBI - ENTREZ
14
15
16
17
18
19
20 Data & Software Resources BLAST CDD COG GENSAT GenBank Whole Genome Shotgun Sequences Gene Gene Expression Nervous System Atlas (GENSAT) Gene Expression Omnibus (GEO) Profiles and Datasets Genome Genome Markers (UniSTS) HomoloGene Mapping Data NCBI Taxonomy Protein Clusters PubChem RefSeq SKY/M Fish and CGH Data Sequence Read Archive FTP Site Structure (MMDB) Trace Archive UniGene UniVec GenPept dbgap Open Access Data dbmhc Data RSS Feeds Sequin tbl2asn Batch Entrez CDTree Cn3D E Utilities NCBI Toolbox ProSplign Splign
21
22
23
24
25
26
27
28 Just the upper left corner of moi
29 Just the lower left corner of moi
30
31
32
33
34
35 * is not a wildcard it is a truncation
36
37
38
39
40
41 Combine Searches Eg #1 #2 NOT #3
42
43
44
45 Use of boolean terms for search AND OR NOT General syntax: term [field] OPERATOR term [field] Use of brackets to combine the terms
46 Available for Database Field Accession All Fields Author Name EC/RN Number Feature Key Filter Gene Name Issue Journal Name Keyword Modification Date Molecular Weight Organism Page Number Primary Accession Properties Protein Name Publication Date SeqID String Sequence Length Substance Name Text Word Title Word Volume Short term ACCN ALL AUTH ECNO FKEY FILT GENE ISS JOUR KYWD MDAT MOLWT ORGN PAGE PACC PROP PROT PDAT SQID SLEN SUBS WORD TITL VOL Nucleotide Protein Genome Structure PopSet NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO
47 PubMed ENTREZ search fields Field Affiliation Author EC/RN Number Filter Full Author Name Issue Journal Title MeSH Date MeSH Subheadings NLM Unique ID Pagination Pharmacological Action Publication Date Publisher Identifier Subset Text Word Title / Abstract Volume Short term AD AU RN FILTER FAU IP TA MHDA SH JID PG PA DP AID SB TW TIAB VI Field All Fields Corporate Author Entrez Date First author Grant Name Investigator Language MeSH Major Topic MeSH Terms Other Term Personal Name as Subject Place of Publication Publication Type Secondary Source ID Substance Name Title Unique Identifiers Short term ALL CN EDAT IAU GR IR LA MAJR MH OT PS PL PT SI NM TI UID
48 Can you find the enhancers/promoters for GLP3 (GERMIN like-protein 3)??
49
50
51
52 Range operator : (ACCN, MOLWT, SLEN) x : y [SLEN] works with dates; molecular weight For more information:
53
54
55
56
57
58
59
60
61
62
63
64
65 Display Format Description Databases Available Summary Default display, hotlinked Accession number and brief description Nucleotide, Protein, CoreNucleotide, EST, GSS, PopSet, Genome, Genome Project Brief Hotlinked Accession number and abbreviated description, hotlinked project number in the case of a genome project Nucleotide, Protein, CoreNucleotide, EST, GSS, PopSet, Genome, Genome Project GenBank Full report format Nucleotide, Protein, CoreNucleotide, EST, GSS, Genome GenPept Full report format Protein Complete GenBank record with all features and all Sequence. This GenBank (full) format is useful for very large GenBank records GenPept Complete GenPept record with all protein features and all Sequence. This format is useful for very large GenBank records Nucleotide, Protein, CoreNucleotide, EST, GSS, Genome Protein
66 Display Format Description Databases Available INSDSeq XML XML DTD for sequence records Nucleotide, Protein GI list List of GenInfo GI indentifiers Nucleotide, Protein, CoreNucleotide, EST, GSS, ASN.1 Abstract syntax Notation One, used data storage and retrieval and to help achieve interoperability among platforms Nucleotide, Protein, CoreNucleotide, EST, GSS, PopSet, Genome EST Native display format for Expressed Sequence Tag records EST Graphics or Graph The graphical view of the sequence Nucleotide, Protein and accessible by selecting the hotlinked Genome Accession numbers GSS Native Display format for the Genome Survey Sequences TinySeq XML Simplified XML for parsing GSS Nucleotide, Protein, CoreNucleotide, EST, GSS, Genome
67 Display Format Description Databases Available Overview Tabular-layout of data including Links to BLAST results, CDD, ftp site and general information for a genome in Genomes; for Genome Project database it is a complete display of links to projects in the database, serves as a portal to links to all projects in the database about the organism specific genome PopSet summary The number set of Accession Numbers comprising the PopSet PopSet accessible by selecting the hotlinked PopSet Acession Numbers UI List List of database ID's PopSet XML Script-parseable format Nucleotide, Protein, Genome Genome, Genome Project
68 Text mining
69
70 Caveat emptor