Annotation. (Chapter 8)

Size: px
Start display at page:

Download "Annotation. (Chapter 8)"

Transcription

1 Annotation (Chapter 8)

2 Genome annotation Genome annotation is the process of attaching biological information to sequences: identify elements on the genome attach biological information to elements store information biological databases

3 Genome annotation gene function, structure and localisation, homology,

4 Genome annotation Many resources available Computational approaches/infrastructure needed to efficiently use the information BioC provides annotation data and software packages that allow easy use of these resources in our analysis

5 Working with Annotation in BioC Annotation packages provides computational framework for accessing biological metadata

6 Annotation in BioC Array1 Array2 Array3 Array4 Array5 Probe Probe Probe Probe Probe5

7 Annotation packages in BioC Annotation data packages Chip Species Function Annotation software packages annotate annotationdbi

8 Chip annotation packages hgu95av2.db lluminahumanv3probeid

9 Species annotation packages Org.Hs.eg.db Org.Mm.eg.db

10 Functional annotation packages GO.db KEGG.db

11 Probe Annotation Resources NCBI Genbank: Ensembl: UCSC Genome Browser: Most databases are cross-referenced extensively. Most useful IDs: Refseq ID, Entrez Gene ID, Ensembl gene/protein ID, Gene Symbol

12 Probe Annotation EntrezGene is a catalog of genetic loci that connects curated sequence information to official nomenclature. UniGene defines sequence clusters. UniGene focuses on protein-coding genes of the nuclear genome (excluding rrna and mitochondrial sequences). RefSeq is a non-redundant set of transcripts and proteins of known genes for many species, including human, mouse and rat.

13 Functional annotation Functional category: Gene Ontology: Pathways: KEGG: GenMAPP: BioCarta: Protein Interactions: STRING: BIND:

14 Gene Ontology Gene Ontology (GO) is a structured vocabulary of terms describing gene products according to molecular function biological process cellular component

15 Gene Ontology Molecular function: elemental activity/task - the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity Biological Process: biological goal or objective broad biological goals, such as mitosis or immune response, that are accomplished by ordered assemblies of molecular functions Cellular Component: location or complex subcellular structures, locations, and macromolecular complexes; examples include nuclues, telomere and RNA ploymerase II holoenzyme.

16 GO inflammatory response Directed Acyclic Graph (DAG) Genes can be associated with many GO terms Increased specificity

17 KEGG pathway database KEGG: Kyoto Encyclopedia of Genes and Genomes Molecular interaction and reaction networks for metabolism, various cellular processes, and human diseases Manually entered from published materials

18 STRING database Interaction network Combined score from several sources

19 plus many more.

20 Software annotation packages annotate associating microarray and other genomic data in real time to biological metadata from web databases such as GenBank, EntrezGenes and PubMed incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources

21 Annotate package

22 Software annotation packages AnnotationDbi assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, EntrezGenes, UniGene, the UCSC Human Genome Project. => Customized annotation libraries can also be assembled.

23 AnnotationDbi package creates annotation data packages

24 Exercises annotation Working with chip level annotation packages : exercises