What You NEED to Know

Size: px
Start display at page:

Download "What You NEED to Know"

Transcription

1 What You NEED to Know Major DNA Databases NCBI RefSeq EBI DDBJ Protein Structural Databases PDB SCOP CCDC Major Protein Sequence Databases UniprotKB Swissprot PIR TrEMBL Genpept Other Major Databases MIM (OMIM) KEGG COG GO Major Protein Domain Databases Interpro Prosite Pfam General Databases/Browsers Taxonomy UCSC Ensembl Organism Specific

2 Organism specific databases

3

4

5

6

7 Large scale sequencing projects

8 While not large scale sequencing it is large scale genotyping

9

10

11

12

13 Databases NCBI - ENTREZ

14

15

16

17

18

19

20 Data & Software Resources BLAST CDD COG GENSAT GenBank Whole Genome Shotgun Sequences Gene Gene Expression Nervous System Atlas (GENSAT) Gene Expression Omnibus (GEO) Profiles and Datasets Genome Genome Markers (UniSTS) HomoloGene Mapping Data NCBI Taxonomy Protein Clusters PubChem RefSeq SKY/M Fish and CGH Data Sequence Read Archive FTP Site Structure (MMDB) Trace Archive UniGene UniVec GenPept dbgap Open Access Data dbmhc Data RSS Feeds Sequin tbl2asn Batch Entrez CDTree Cn3D E Utilities NCBI Toolbox ProSplign Splign

21

22

23

24

25

26

27

28 Just the upper left corner of moi

29 Just the lower left corner of moi

30

31

32

33

34

35 * is not a wildcard it is a truncation

36

37

38

39

40

41 Combine Searches Eg #1 #2 NOT #3

42

43

44

45 Use of boolean terms for search AND OR NOT General syntax: term [field] OPERATOR term [field] Use of brackets to combine the terms

46 Available for Database Field Accession All Fields Author Name EC/RN Number Feature Key Filter Gene Name Issue Journal Name Keyword Modification Date Molecular Weight Organism Page Number Primary Accession Properties Protein Name Publication Date SeqID String Sequence Length Substance Name Text Word Title Word Volume Short term ACCN ALL AUTH ECNO FKEY FILT GENE ISS JOUR KYWD MDAT MOLWT ORGN PAGE PACC PROP PROT PDAT SQID SLEN SUBS WORD TITL VOL Nucleotide Protein Genome Structure PopSet NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO

47 PubMed ENTREZ search fields Field Affiliation Author EC/RN Number Filter Full Author Name Issue Journal Title MeSH Date MeSH Subheadings NLM Unique ID Pagination Pharmacological Action Publication Date Publisher Identifier Subset Text Word Title / Abstract Volume Short term AD AU RN FILTER FAU IP TA MHDA SH JID PG PA DP AID SB TW TIAB VI Field All Fields Corporate Author Entrez Date First author Grant Name Investigator Language MeSH Major Topic MeSH Terms Other Term Personal Name as Subject Place of Publication Publication Type Secondary Source ID Substance Name Title Unique Identifiers Short term ALL CN EDAT IAU GR IR LA MAJR MH OT PS PL PT SI NM TI UID

48 Can you find the enhancers/promoters for GLP3 (GERMIN like-protein 3)??

49

50

51

52 Range operator : (ACCN, MOLWT, SLEN) x : y [SLEN] works with dates; molecular weight For more information:

53

54

55

56

57

58

59

60

61

62

63

64

65 Display Format Description Databases Available Summary Default display, hotlinked Accession number and brief description Nucleotide, Protein, CoreNucleotide, EST, GSS, PopSet, Genome, Genome Project Brief Hotlinked Accession number and abbreviated description, hotlinked project number in the case of a genome project Nucleotide, Protein, CoreNucleotide, EST, GSS, PopSet, Genome, Genome Project GenBank Full report format Nucleotide, Protein, CoreNucleotide, EST, GSS, Genome GenPept Full report format Protein Complete GenBank record with all features and all Sequence. This GenBank (full) format is useful for very large GenBank records GenPept Complete GenPept record with all protein features and all Sequence. This format is useful for very large GenBank records Nucleotide, Protein, CoreNucleotide, EST, GSS, Genome Protein

66 Display Format Description Databases Available INSDSeq XML XML DTD for sequence records Nucleotide, Protein GI list List of GenInfo GI indentifiers Nucleotide, Protein, CoreNucleotide, EST, GSS, ASN.1 Abstract syntax Notation One, used data storage and retrieval and to help achieve interoperability among platforms Nucleotide, Protein, CoreNucleotide, EST, GSS, PopSet, Genome EST Native display format for Expressed Sequence Tag records EST Graphics or Graph The graphical view of the sequence Nucleotide, Protein and accessible by selecting the hotlinked Genome Accession numbers GSS Native Display format for the Genome Survey Sequences TinySeq XML Simplified XML for parsing GSS Nucleotide, Protein, CoreNucleotide, EST, GSS, Genome

67 Display Format Description Databases Available Overview Tabular-layout of data including Links to BLAST results, CDD, ftp site and general information for a genome in Genomes; for Genome Project database it is a complete display of links to projects in the database, serves as a portal to links to all projects in the database about the organism specific genome PopSet summary The number set of Accession Numbers comprising the PopSet PopSet accessible by selecting the hotlinked PopSet Acession Numbers UI List List of database ID's PopSet XML Script-parseable format Nucleotide, Protein, Genome Genome, Genome Project

68 Text mining

69

70 Caveat emptor