Introduc)on to Databases and Resources Biological Databases and Resources

Size: px
Start display at page:

Download "Introduc)on to Databases and Resources Biological Databases and Resources"

Transcription

1 Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources

2 Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs data is stored and organised - Describe the different types of data found at the NCBI and EBI resources - Locate key bioinforma)cs databases and resources

3 Learning Outcomes Introduc)on to Databases and Resources Understand the structure and layout of the NCBI and EBI data resources Understand the difference between databases, tools, repositories Search for data from specific databases using accessions numbers, gene name Use selected tools at NCBI and EBI

4 Data

5 Introduc)on Range of different online databases and resources Need to know which: Which databases and resources exist What tools are available to mine these resources What tools are available to search across resources

6 Biological databases

7 Nucleic Acids Research

8 Databases Databases are: Public or private Access and submission Protein, nucleo)de, structure, literature, annota)on Generalised or specialised Curated or non-curated Sequence or genome-centred

9 Primary Databases

10 Primary Databases Interna)onal Nucleo)de Sequence Database Collabora)on (INSDC) Genomic sequence data stored in 3 public databases Each have own accession numbers and tools

11 Secondary Databases In-depth databases built upon primary sequence data Provide several different resources and annota)ons

12 Most Popular Bioinforma)cs Na)onal Centre for Biotechnology Informa)on (NCBI) Resources European Bioinforma)cs Ins)tute (EMBL-EBI)

13 NCBI Na)onal Centre for Biotechnology Informa)on (NCBI) Na)onal Ins)tute of Health funded ini)a)ve established to store molecular biology informa)on Has grown drama)cally since the comple)on of the human genome project Developed and maintained a variety of databases and resources

14 GenBank The NIH gene)c sequence database Contains an annotated collec)on of all publicly available DNA sequences Part of INSDC The database is updated on a regular basis, approximately every two months Several divisions within GenBank

15 GenBank Divisions Expressed sequence tags (ESTs) Short sub-sequences of a cdna sequence High-throughout Genomic Sequences (HTGs) Clone based HTGs Complete Microbial Genomes Whole Genome Shotgun Sequences(WGS) Transcriptome Shotgun Assembly Sequences (TSA)

16 NCBI

17 Analysis Tools

18 Tutorials

19 NCBI DNA and RNA

20 NCBI Not only DNA data

21 EMBL - EBI Maintain the world s most comprehensive range of freely available and up-to-date molecular databases Offer online and live training events for using their resources hzps://

22 EMBL EBI

23 EMBL EBI

24 ACCESSING DATA

25 Accessing Data Why would you need to access sequence data? Know what the sequence of a gene is Iden)fy variants in the sequence Compare your sequence to others Iden)fy similar sequences Find diseases associated with varia)on in your gene of interest

26 Accessing Data Important to be clear what data you are searching for Most tools have been developed to link to all annota)ons for a par)cular query Both NCBI and EBI provide portals to allow you to search across all available databases with a single query

27 Example: FOXP2 Human

28 NCBI Portal Search Literature Genes Health Protein Genomes Chemicals

29 Popular Databases Gene One stop resource for all annota)on informa)on for a gene PubMed Extensive biomedical literature database Nucleo)de Database of all DNA sequence data SNP Database of single nucleo)de polymorphisms Protein Database of protein sequences

30 Popular Databases RefSeq Comprehensive, integrated, wellannotated set of reference sequences genomic, transcript and protein OMIM Online Mendelian Inheritance in Man - Database of human genes and gene)c phenotypes ClinVar Database of genomic varia)on and the rela)onship to human health

31 Gene Database Foxp2

32 Gene Database Foxp2

33 Gene Database Foxp2

34 Gene Database Foxp2

35 Gene Database Foxp2

36 Gene Database Foxp2

37 Gene Database Foxp2

38 Gene Database Foxp2

39 GenBank Entry Foxp2

40 GenBank Entry Foxp2

41 Accession Numbers Each GenBank record, consis)ng of both a sequence and its annota)ons is assigned a unique iden)fier called an accession number

42 Accession Number Prefixes

43 EMBL-EBI

44 EMBL EBI Foxp2

45 EMBL EBI Foxp2

46 EMBL EBI Foxp2

47 EMBL EBI FoxP2

48 EMBL EBI FoxP2

49 EMBL EBI FoxP2

50 Other popular resources at EBI Ensembl resource for high quality intergrated annota)on data Uniprot Universal Protein Resource for protein sequence and func)onal annota)on data PDBe Protein data bank Europe Collec)on of 3D structural data InterPro database of protein families, domains and conserved sites

51 Specialised Databases A large number of specialised databases exist Most of the sequences are also in GenBank/EMBL bank May contain whole genomes May contain specialised resources Contain specific tools for mining the data

52 Specialised Databases Plasmodium hzp://plasmodb.org Sanger s specialised collec)ons hzp:// Hepa))s Database hzp://hcv.lanl.gov/content/hcv-db/index Influenza Research Database hzp://

53 Summary Large amount of data out there Primary databases store raw sequence data Secondary databases provide informa)on on the annota)on of the sequence data Important to know how and where data is stored NCBI and EBI are the two most popular resources for extrac)ng biological data