Introduc)on to Databases and Resources Biological Databases and Resources
|
|
- Arlene Hawkins
- 5 years ago
- Views:
Transcription
1 Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources
2 Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs data is stored and organised - Describe the different types of data found at the NCBI and EBI resources - Locate key bioinforma)cs databases and resources
3 Learning Outcomes Introduc)on to Databases and Resources Understand the structure and layout of the NCBI and EBI data resources Understand the difference between databases, tools, repositories Search for data from specific databases using accessions numbers, gene name Use selected tools at NCBI and EBI
4 Data
5 Introduc)on Range of different online databases and resources Need to know which: Which databases and resources exist What tools are available to mine these resources What tools are available to search across resources
6 Biological databases
7 Nucleic Acids Research
8 Databases Databases are: Public or private Access and submission Protein, nucleo)de, structure, literature, annota)on Generalised or specialised Curated or non-curated Sequence or genome-centred
9 Primary Databases
10 Primary Databases Interna)onal Nucleo)de Sequence Database Collabora)on (INSDC) Genomic sequence data stored in 3 public databases Each have own accession numbers and tools
11 Secondary Databases In-depth databases built upon primary sequence data Provide several different resources and annota)ons
12 Most Popular Bioinforma)cs Na)onal Centre for Biotechnology Informa)on (NCBI) Resources European Bioinforma)cs Ins)tute (EMBL-EBI)
13 NCBI Na)onal Centre for Biotechnology Informa)on (NCBI) Na)onal Ins)tute of Health funded ini)a)ve established to store molecular biology informa)on Has grown drama)cally since the comple)on of the human genome project Developed and maintained a variety of databases and resources
14 GenBank The NIH gene)c sequence database Contains an annotated collec)on of all publicly available DNA sequences Part of INSDC The database is updated on a regular basis, approximately every two months Several divisions within GenBank
15 GenBank Divisions Expressed sequence tags (ESTs) Short sub-sequences of a cdna sequence High-throughout Genomic Sequences (HTGs) Clone based HTGs Complete Microbial Genomes Whole Genome Shotgun Sequences(WGS) Transcriptome Shotgun Assembly Sequences (TSA)
16 NCBI
17 Analysis Tools
18 Tutorials
19 NCBI DNA and RNA
20 NCBI Not only DNA data
21 EMBL - EBI Maintain the world s most comprehensive range of freely available and up-to-date molecular databases Offer online and live training events for using their resources hzps://
22 EMBL EBI
23 EMBL EBI
24 ACCESSING DATA
25 Accessing Data Why would you need to access sequence data? Know what the sequence of a gene is Iden)fy variants in the sequence Compare your sequence to others Iden)fy similar sequences Find diseases associated with varia)on in your gene of interest
26 Accessing Data Important to be clear what data you are searching for Most tools have been developed to link to all annota)ons for a par)cular query Both NCBI and EBI provide portals to allow you to search across all available databases with a single query
27 Example: FOXP2 Human
28 NCBI Portal Search Literature Genes Health Protein Genomes Chemicals
29 Popular Databases Gene One stop resource for all annota)on informa)on for a gene PubMed Extensive biomedical literature database Nucleo)de Database of all DNA sequence data SNP Database of single nucleo)de polymorphisms Protein Database of protein sequences
30 Popular Databases RefSeq Comprehensive, integrated, wellannotated set of reference sequences genomic, transcript and protein OMIM Online Mendelian Inheritance in Man - Database of human genes and gene)c phenotypes ClinVar Database of genomic varia)on and the rela)onship to human health
31 Gene Database Foxp2
32 Gene Database Foxp2
33 Gene Database Foxp2
34 Gene Database Foxp2
35 Gene Database Foxp2
36 Gene Database Foxp2
37 Gene Database Foxp2
38 Gene Database Foxp2
39 GenBank Entry Foxp2
40 GenBank Entry Foxp2
41 Accession Numbers Each GenBank record, consis)ng of both a sequence and its annota)ons is assigned a unique iden)fier called an accession number
42 Accession Number Prefixes
43 EMBL-EBI
44 EMBL EBI Foxp2
45 EMBL EBI Foxp2
46 EMBL EBI Foxp2
47 EMBL EBI FoxP2
48 EMBL EBI FoxP2
49 EMBL EBI FoxP2
50 Other popular resources at EBI Ensembl resource for high quality intergrated annota)on data Uniprot Universal Protein Resource for protein sequence and func)onal annota)on data PDBe Protein data bank Europe Collec)on of 3D structural data InterPro database of protein families, domains and conserved sites
51 Specialised Databases A large number of specialised databases exist Most of the sequences are also in GenBank/EMBL bank May contain whole genomes May contain specialised resources Contain specific tools for mining the data
52 Specialised Databases Plasmodium hzp://plasmodb.org Sanger s specialised collec)ons hzp:// Hepa))s Database hzp://hcv.lanl.gov/content/hcv-db/index Influenza Research Database hzp://
53 Summary Large amount of data out there Primary databases store raw sequence data Secondary databases provide informa)on on the annota)on of the sequence data Important to know how and where data is stored NCBI and EBI are the two most popular resources for extrac)ng biological data