Introduction to Bioinformatics
|
|
- Andrea Morrison
- 6 years ago
- Views:
Transcription
1 Introduction to Bioinformatics September 1, 2006 Jonathan Pevsner, Ph.D.
2 Teaching assistants Hugh Cahill Jennifer Turney Meg Zupancic
3 Who is taking this course? People with very diverse backgrounds in biology People with diverse backgrounds in computer science and biostatistics Most people have a favorite gene, protein, or disease
4 What are the goals of the course? To provide an introduction to bioinformatics with a focus on the National Center for Biotechnology Information (NCBI) and EBI To focus on the analysis of DNA, RNA and proteins To introduce you to the analysis of genomes To combine theory and practice to help you solve research problems
5 Themes throughout the course Textbooks Web sites Literature references Gene/protein families Computer labs
6 Textbook The course textbook is J. Pevsner, Bioinformatics and Functional Genomics (Wiley, 2003). The chapters contain content, lab exercises, and quizzes that were developed in this course over the past six years. A few copies will be available on reserve at Welch Library for those of you who do not want to buy a copy (go up to the 2 nd floor), and the library has six more copies. Several other bioinformatics texts are available: Baxevanis and Ouellette David Mount Durbin et al.
7 Web sites The course website is reached via: (or Google pevsnerlab courses) This site contains the powerpoints for each lecture. The textbook website is: This has 1000 URLs, organized by chapter This site also contains the same powerpoints. The weekly quizzes are on my website: Once you log in and take a quiz, you will get instant feedback. You can use moodle to ask questions as well.
8 Literature references You are encouraged to read original source articles. They will enhance your understanding of the material. Reading will be assigned.
9 Themes throughout the course: gene/protein families We will use retinol-binding protein 4 (RBP4) as a model gene/protein throughout the course. RBP4 is a member of the lipocalin family. It is a small, abundant carrier protein. We will study it in a variety of contexts including --sequence alignment --gene expression --protein structure --phylogeny --homologs in various species We will also use other examples, such as the globins and the pol protein of HIV-1
10
11 The HIV-1 pol gene encodes three proteins Aspartyl protease Reverse transcriptase Integrase PR RT IN
12 Themes throughout the course: computer labs There is a computer lab each Friday. This is a chance to gain practical experience using a variety of web resources. You can do the lab on your own, ahead of time. However, during the Friday lab you can get help on problems, and in some cases the computers will have specialized software.
13 Grading 40% ten moodle quizzes (corresponding to chapters 2-11) 30% final exam October 25 (in class) 30% discovery of a novel gene: --Find the novel gene by the end of September, and turn in the final report, with phylogenetic tree, by October 25 --Instructions are posted on the course website --We will discuss this project in detail in the next two weeks.
14 Grading Quizzes are taken at the moodle website, and are due one week after the relevant lecture ten quizzes 4% Chapter 2 quiz (sequences) 4% Chapter 3 quiz (alignment) 4% Chapter 4 quiz (BLAST) 4% Chapter 5 quiz (advanced BLAST) 4% Chapter 6 quiz (RNA) 4% Chapter 7 quiz (microarrays) 4% Chapter 8 quiz (proteomics) 4% Chapter 9 quiz (protein structure) 4% Chapter 10 quiz (multiple alignment) 4% Chapter 11 quiz (phylogeny) 30% find-a-gene project (due October 25) 30% final exam October 25 (in class)
15 Outline for today (chapters 1 and 2) Definition of bioinformatics Overview of the NCBI website Accessing information about DNA and proteins --Definition of an accession number --Four ways to find information on proteins and DNA Access to biomedical literature
16 What is bioinformatics? Interface of biology and computers Analysis of proteins, genes and genomes using computer algorithms and computer databases Genomics is the analysis of genomes. The tools of bioinformatics are used to make sense of the billions of base pairs of DNA that are sequenced by genomics projects.
17 Top ten challenges for bioinformatics [1] Precise models of where and when transcription will occur in a genome (initiation and termination) [2] Precise, predictive models of alternative RNA splicing [3] Precise models of signal transduction pathways; ability to predict cellular responses to external stimuli [4] Determining protein:dna, protein:rna, protein:protein recognition codes [5] Accurate ab initio protein structure prediction
18 Top ten challenges for bioinformatics [6] Rational design of small molecule inhibitors of proteins [7] Mechanistic understanding of protein evolution [8] Mechanistic understanding of speciation [9] Development of effective gene ontologies: systematic ways to describe gene and protein function [10] Education: development of bioinformatics curricula Source: Ewan Birney, Chris Burge, Jim Fickett
19 On bioinformatics Science is about building causal relations between natural phenomena (for instance, between a mutation in a gene and a disease). The development of instruments to increase our capacity to observe natural phenomena has, therefore, played a crucial role in the development of science - the microscope being the paradigmatic example in biology. With the human genome, the natural world takes an unprecedented turn: it is better described as a sequence of symbols. Besides high-throughput machines such as sequencers and DNA chip readers, the computer and the associated software becomes the instrument to observe it, and the discipline of bioinformatics flourishes.
20 On bioinformatics However, as the separation between us (the observers) and the phenomena observed increases (from organism to cell to genome, for instance), instruments may capture phenomena only indirectly, through the footprints they leave. Instruments therefore need to be calibrated: the distance between the reality and the observation (through the instrument) needs to be accounted for. This issue of Genome Biology is about calibrating instruments to observe gene sequences; more specifically, computer programs to identify human genes in the sequence of the human genome. Martin Reese and Roderic Guigó, Genome Biology (Suppl I):S1, introducing EGASP, the Encyclopedia of DNA Elements (ENCODE) Genome Annotation Assessment Project
21 bioinformatics medical informatics Tool-users public health informatics databases algorithms Tool-makers infrastructure
22 Three perspectives on bioinformatics The cell The organism The tree of life Page 4
23
24 DNA RNA protein phenotype Page 5
25 Time of development Body region, physiology, pharmacology, pathology Page 5
26 After Pace NR (1997) Science 276:734 Page 6
27 DNA RNA protein phenotype
28 Growth of GenBank Base pairs of DNA (billions) Fig. 2.1 Year Page 17 Sequences (millions) Updated : >40b base pairs
29 Base pairs of DNA (billions) Growth of GenBank Sequences (millions) December 1982 June 2006
30 Growth of the International Nucleotide Sequence Database Collaboration Base pairs of DNA (billions) Base pairs contributed by GenBank EMBL DDBJ
31 Central dogma of molecular biology DNA RNA protein genome transcriptome proteome Central dogma of bioinformatics and genomics
32 DNA RNA protein phenotype genomic DNA databases cdna ESTs UniGene protein sequence databases Fig. 2.2 Page 20
33 There are three major public DNA databases EMBL GenBank DDBJ The underlying raw DNA sequences are identical Page 16
34 There are three major public DNA databases EMBL Housed at EBI European Bioinformatics Institute GenBank Housed at NCBI National Center for Biotechnology Information DDBJ Housed in Japan Page 16
35 >100,000 species are represented in GenBank all species 128,941 viruses 6,137 bacteria 31,262 archaea 2,100 eukaryota 87,147 Table 2-1 Page 17
36 Taxonomy nodes at NCBI 8/06
37 The most sequenced organisms in GenBank Homo sapiens 10.7 billion bases Mus musculus 6.5b Rattus norvegicus 5.6b Danio rerio 1.7b Zea mays 1.4b Oryza sativa 0.8b Drosophila melanogaster 0.7b Gallus gallus 0.5b Arabidopsis thaliana 0.5b Updated GenBank release Table 2-2 Page 18
38 The most sequenced organisms in GenBank Homo sapiens 11.2 billion bases Mus musculus 7.5b Rattus norvegicus 5.7b Danio rerio 2.1b Bos taurus 1.9b Zea mays 1.4b Oryza sativa (japonica) 1.2b Xenopus tropicalis 0.9b Canis familiaris 0.8b Drosophila melanogaster 0.7b Updated GenBank release Table 2-2 Page 18
39 The most sequenced organisms in GenBank Homo sapiens 12.3 billion bases Mus musculus 8.0b Rattus norvegicus 5.7b Bos taurus 3.5b Danio rerio 2.5b Zea mays 1.8b Oryza sativa (japonica) 1.5b Strongylocentrotus purpurata 1.2b Sus scrofa 1.0b Xenopus tropicalis 1.0b Updated GenBank release Table 2-2 Page 18
40 National Center for Biotechnology Information (NCBI) Page 24
41 Fig. 2.5 Page 25
42 Fig. 2.5 Page 25
43 PubMed is National Library of Medicine's search service 16 million citations in MEDLINE links to participating online journals PubMed tutorial (via Education on side bar) Page 24
44 Entrez integrates the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes Page 24
45 Entrez is a search and retrieval system that integrates NCBI databases Page 24
46 BLAST is Basic Local Alignment Search Tool NCBI's sequence similarity search tool supports analysis of DNA and protein databases 100,000 searches per day Page 25
47 OMIM is Online Mendelian Inheritance in Man catalog of human genes and genetic disorders edited by Dr. Victor McKusick, others at JHU Page 25
48 Books is searchable resource of on-line books Page 26
49 TaxBrowser is browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms Page 26
50 Structure site includes Molecular Modelling Database (MMDB) biopolymer structures obtained from the Protein Data Bank (PDB) Cn3D (a 3D-structure viewer) vector alignment search tool (VAST) Page 26
51 Accessing information on molecular sequences Page 26
52 Accession numbers are labels for sequences NCBI includes databases (such as GenBank) that contain information on DNA, RNA, or protein sequences. You may want to acquire information beginning with a query such as the name of a protein of interest, or the raw nucleotides comprising a DNA sequence of interest. DNA sequences and other molecular data are tagged with accession numbers that are used to identify a sequence or other record relevant to molecular data. Page 26
53 What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 NT_ Rs GenBank genomic DNA sequence Genomic contig dbsnp (single nucleotide polymorphism) DNA N An expressed sequence tag (1 of 170) NM_ RefSeq DNA sequence (from a transcript) RNA NP_ AAC02945 Q KT7 RefSeq protein GenBank protein SwissProt protein Protein Data Bank structure record protein Page 27
54 Four ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Note: LocusLink at NCBI was recently retired. The third printing of the book has updated these sections (pages 27-31). Page 27
55 4 ways to access protein and DNA sequences [1] Entrez Gene with RefSeq Entrez Gene is a great starting point: it collects key information on each gene/protein from major databases. It covers all major organisms. RefSeq provides a curated, optimal accession number for each DNA (NM_006744) or protein (NP_007635) Page 27
56 From the NCBI home page, type rbp4 and hit Go revised Fig. 2.7 Page 29
57 revised Fig. 2.7 Page 29
58
59
60 By applying limits, there are now just two entries
61 Entrez Gene (top of page) Note that links to many other RBP4 database entries are available revised Fig. 2.8 Page 30
62 Entrez Gene (middle of page)
63 Entrez Gene (bottom of page)
64 Fig. 2.9 Page 32
65 Fig. 2.9 Page 32
66 Fig. 2.9 Page 32
67 FASTA format Fig Page 32
68 What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 NT_ Rs GenBank genomic DNA sequence Genomic contig dbsnp (single nucleotide polymorphism) DNA N An expressed sequence tag (1 of 170) NM_ RefSeq DNA sequence (from a transcript) RNA NP_ AAC02945 Q KT7 RefSeq protein GenBank protein SwissProt protein Protein Data Bank structure record protein Page 27
69 NCBI s important RefSeq project: best representative sequences RefSeq (accessible via the main page of NCBI) provides an expertly curated accession number that corresponds to the most stable, agreed-upon reference version of a sequence. RefSeq identifiers include the following formats: Complete genome Complete chromosome Genomic contig mrna (DNA format) Protein NC_###### NC_###### NT_###### NM_###### e.g. NM_ NP_###### e.g. NP_ Page 29-30
70 NCBI s RefSeq project: accession for genomic, mrna, protein sequences Accession Molecule Method Note AC_ Genomic Mixed Alternate complete genomic AP_ Protein Mixed Protein products; alternate NC_ Genomic Mixed Complete genomic molecules NG_ Genomic Mixed Incomplete genomic regions NM_ mrna Mixed Transcript products; mrna NM_ mrna Mixed Transcript products; 9-digit NP_ Protein Mixed Protein products; NP_ Protein Curation Protein products; 9-digit NR_ RNA Mixed Non-coding transcripts NT_ Genomic Automated Genomic assemblies NW_ Genomic Automated Genomic assemblies NZ_ABCD Genomic Automated Whole genome shotgun data XM_ mrna Automated Transcript products XP_ Protein Automated Protein products XR_ RNA Automated Transcript products YP_ Protein Auto. & Curated Protein products ZP_ Protein Automated Protein products
71 Four ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 31
72 DNA RNA protein complementary DNA (cdna) UniGene Fig. 2.3 Page 23
73 UniGene: unique genes via ESTs Find UniGene at NCBI: UniGene clusters contain many expressed sequence tags (ESTs), which are DNA sequences (typically 500 base pairs in length) corresponding to the mrna from an expressed gene. ESTs are sequenced from a complementary DNA (cdna) library. UniGene data come from many cdna libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution. Pages 20-21
74 Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1 Fig. 2.3 Page 23
75 Cluster sizes in UniGene This is a gene with 10 ESTs associated; the cluster size is 10
76 Cluster sizes in UniGene (human) Cluster size (ESTs) Number of clusters 1 42, , , , , , , , ,000-30,000 8 UniGene build 194, 8/06
77 UniGene: unique genes via ESTs Conclusion: UniGene is a useful tool to look up information about expressed genes. UniGene displays information about the abundance of a transcript (expressed gene), as well as its regional distribution of expression (e.g. brain vs. liver). We will discuss UniGene further on September 18 (gene expression). Page 31
78 Five ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 31
79 Ensembl to access protein and DNA sequences Try Ensembl at for a premier human genome web browser. We will encounter Ensembl as we study the human genome, BLAST, and other topics.
80 click human
81 enter RBP4
82
83 Five ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 33
84 ExPASy to access protein and DNA sequences ExPASy sequence retrieval system (ExPASy = Expert Protein Analysis System) Visit Page 33
85 Fig Page 33
86
87 Example of how to access sequence data: HIV-1 pol There are many possible approaches. Begin at the main page of NCBI, and type an Entrez query: hiv-1 pol Page 34
88
89 Searching for HIV-1 pol: Following the genome link yields a manageable three results Page 34
90 Example of how to access sequence data: HIV-1 pol For the Entrez query: hiv-1 pol there are about 40,000 nucleotide or protein records (and >100,000 records for a search for hiv-1 ), but these can easily be reduced in two easy steps: --specify the organism, e.g. hiv-1[organism] --limit the output to RefSeq! Page 34
91 only 1 RefSeq over 100,000 nucleotide entries for HIV-1
92 Examples of how to access sequence data: histone query for histone # results protein records RefSeq entries 7544 RefSeq (limit to human) 1108 NOT deacetylase 697 At this point, select a reasonable candidate (e.g. histone 2, H4) and follow its link to Entrez Gene. There, you can confirm you have the right gene/protein
93
94 Access to Biomedical Literature Page 35
95 PubMed at NCBI to find literature information
96 PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries. It has >14 million records dating back to Page 35
97 MeSH is the acronym for "Medical Subject Headings." MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature. Page 35
98
99
100 PubMed search strategies Try the tutorial ( education on the left sidebar) Use boolean queries (capitalize AND, OR, NOT) lipocalin AND disease Try using limits Try Links to find Entrez information and external resources Obtain articles on-line via Welch Medical Library (and download pdf files): Page 35
101 1 AND lipocalin AND disease (60 results) 1 OR lipocalin OR disease (1,650,000 results) 1 NOT 2 8/ lipocalin NOT disease (530 results) Fig Page 34
102 Search result: globin is present Article contents: globin is absent globin is found true positive false positive (article does not discuss globins) globin is not found false negative (article discusses globins) true negative 8/06
103 WelchWeb is available at
104 Brian Brown and Carrie Iwema are the Welch Medical Library liasons to the basic sciences
105 Course sponsors Dept. of Molecular Microbiology & Immunology, and Dept. of Biostatistics, School of Public Health
Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationIntroduction to Bioinformatics. What are the goals of the course? Who is taking this course? Textbook. Web sites. Literature references
Introduction to Bioinformatics Who is taking this course? People with very diverse backgrounds in biology Some people with backgrounds in computer science and biostatistics Most people (will) have a favorite
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationIntroduction to Bioinformatics. What are the goals of the course? Who is taking this course? Different user needs, different approaches
Introduction to Bioinformatics Who is taking this course? Monday, November 19, 2012 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707 People with very diverse backgrounds in biology
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationIntroduction to Bioinformatics Part 1 of 2
Introduction to Bioinformatics Part 1 of 2 CS 91.510 January 28, 2004 Georges Grinstein, Ph.D. grinstein@cs.uml.edu Copyright notice Many of the images in this PowerPoint presentation are from Bioinformatics
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationInformation Driven Biomedicine. Prof. Santosh K. Mishra Executive Director, BII CIAPR IV Shanghai, May
Information Driven Biomedicine Prof. Santosh K. Mishra Executive Director, BII CIAPR IV Shanghai, May 21 2004 What/How RNA Complexity of Data Information The Genetic Code DNA RNA Proteins Pathways Complexity
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationuser s guide Question 1
Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966
More informationBioinformatics for Cell Biologists
Bioinformatics for Cell Biologists 15 19 March 2010 Developmental Biology and Regnerative Medicine (DBRM) Schedule Monday, March 15 09.00 11.00 Introduction to course and Bioinformatics (L1) D224 Helena
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2015-2016 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2016-2017 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationAnnotation. (Chapter 8)
Annotation (Chapter 8) Genome annotation Genome annotation is the process of attaching biological information to sequences: identify elements on the genome attach biological information to elements store
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationGene Prediction 10/21/05
Gene Prediction 1/21/5 1/21/5 Gene Prediction Announcements Eam 2 - net Friday Posted online: Eam 2 Study Guide 544 Reading Assignment (2 papers) (formerly Gene Prediction - ) 1/21/5 D Dobbs ISU - BCB
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationWeb-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.
Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationIntroduction to Bioinformatics for Medical Research. Gideon Greenspan TA: Oleg Rokhlenko. Lecture 1
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il TA: Oleg Rokhlenko Lecture 1 Introduction to Bioinformatics Introduction to Bioinformatics What is Bioinformatics?
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationGenetics and Bioinformatics
Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s
More informationBIOL 274 Introduction to Bioinformatics Fall 2016
Instructor: Eric S. Ho (hoe@lafayette.edu) Office: Kunkel 13 Office hours: TTh 2-4 pm Lecture: MWF 11:00-11:50 am, Venue: Kunkel 117 Lab: M/W 1:10-4:00 pm, Venue: Kunkel 313B TAs: Amy Boles (bolesa@lafayette.edu),
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationBIMM 143: Introduction to Bioinformatics (Winter 2018)
BIMM 143: Introduction to Bioinformatics (Winter 2018) Course Instructor: Dr. Barry J. Grant ( bjgrant@ucsd.edu ) Course Website: https://bioboot.github.io/bimm143_w18/ DRAFT: 2017-12-02 (20:48:10 PST
More informationApplied Bioinformatics
Applied Bioinformatics Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Course overview What is bioinformatics Data driven science: the creation and advancement
More informationWeek 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html
More informationGREG GIBSON SPENCER V. MUSE
A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map
More informationEntrez Gene: gene-centered information at NCBI
D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National
More informationCompiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology
Bioinformatics Model Answers Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology Page 1 of 15 Previous years questions asked. 1. Describe the software used in bioinformatics 2. Name four
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationEnsembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More informationGene Regulation 10/19/05
10/19/05 Gene Regulation (formerly Gene Prediction - 2) Gene Prediction & Regulation Mon - Overview & Gene structure review: Eukaryotes vs prokaryotes Wed - Regulatory regions: Promoters & enhancers -
More informationIntroduc)on to Databases and Resources Biological Databases and Resources
Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs
More informationGrundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1
More informationDNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences
DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences Huiqing Liu Hao Han Jinyan Li Limsoon Wong Institute for Infocomm Research, 21 Heng Mui Keng Terrace,
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review Visualizing
More informationBIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1
BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1 Bioinformatics Databases http://bioboot.github.io/bioinf525_w17/module1/#1.1 Dr. Barry Grant Jan 2017 Overview: The purpose of this lab session is
More informationOnline Mendelian Inheritance in Man (OMIM)
HUMAN MUTATION 15:57 61 (2000) MDI SPECIAL ARTICLE Online Mendelian Inheritance in Man (OMIM) Ada Hamosh, Alan F. Scott,* Joanna Amberger, David Valle, and Victor A. McKusick McKusick-Nathans Institute
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationOverview of Health Informatics. ITI BMI-Dept
Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational
More informationPathway Analysis. Min Kim Bioinformatics Core Facility 2/28/2018
Pathway Analysis Min Kim Bioinformatics Core Facility 2/28/2018 Outline 1. Background 2. Databases: KEGG, Reactome, Biocarta, Gene Ontology, MSigDB, MetaCyc, SMPDB, IPA. 3. Statistical Methods: Overlap
More informationFollowing text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005
Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 8
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationGenome Resources. Genome Resources. Maj Gen (R) Suhaib Ahmed, HI (M)
Maj Gen (R) Suhaib Ahmed, I (M) The human genome comprises DNA sequences mostly contained in the nucleus. A small portion is also present in the mitochondria. The nuclear DNA is present in chromosomes.
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationFACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE
FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationBLASTing through the kingdom of life
Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the database of nucleotide sequences at the National Center for Biotechnology
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationA Field Guide to GenBank and NCBI Molecular Biology Resources
A Field Guide to GenBank and NCBI Molecular Biology Resources slightly modified from Peter Cooper ftp://ftp.ncbi.nih.gov/pub/cooper/fieldguide/ Eric Sayers ftp://ftp.ncbi.nih.gov/pub/sayers/field_guide/u_penn/
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationDiscovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies
Discovering gene regulatory control using ChIP-chip and ChIP-seq Part 1 An introduction to gene regulatory control, concepts and methodologies Ian Simpson ian.simpson@.ed.ac.uk http://bit.ly/bio2links
More informationHot Topics. What s New with BLAST?
Hot Topics What s New with BLAST? Slides based on NCBI talk at American Society of Human Genetics October 2005 Hot Topics Outline I. New BLAST Algorithm: Discontiguous MegaBLAST II. New Databases III.
More informationFUNCTIONAL BIOINFORMATICS
Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationBIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP
Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in
More informationEngineering Genetic Circuits
Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationDiscovering gene regulatory control using ChIP-chip and ChIP-seq. An introduction to gene regulatory control, concepts and methodologies
Discovering gene regulatory control using ChIP-chip and ChIP-seq An introduction to gene regulatory control, concepts and methodologies Ian Simpson ian.simpson@.ed.ac.uk bit.ly/bio2_2012 The Central Dogma
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationTranscriptome Assembly, Functional Annotation (and a few other related thoughts)
Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationBLASTing through the kingdom of life
Information for students Instructions: In short, you will copy one of the sequences from the data set, use blastn to identify it, and use the information from your search to answer the questions below.
More informationBiology 644: Bioinformatics
Processes Activation Repression Initiation Elongation.... Processes Splicing Editing Degradation Translation.... Transcription Translation DNA Regulators DNA-Binding Transcription Factors Chromatin Remodelers....
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationGenome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007
Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html
More informationBLASTing through the kingdom of life
Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the main database of nucleotide sequences at the National Center for Biotechnology
More informationG4120: Introduction to Computational Biology
G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Lecture 3 February 13, 2003 Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics
More informationRetrieval of gene information at NCBI
Retrieval of gene information at NCBI Some notes 1. http://www.cs.ucf.edu/~xiaoman/fall/ 2. Slides are for presenting the main paper, should minimize the copy and paste from the paper, should write in
More informationDigital information cycle. Database. Database. BINF 630: Bioinformatics Methods
Digital information cycle BINF 630: Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu Creation and capture Storage and management Rights management Search and access Distribution Electronic
More informationIntroduction. CS482/682 Computational Techniques in Biological Sequence Analysis
Introduction CS482/682 Computational Techniques in Biological Sequence Analysis Outline Course logistics A few example problems Course staff Instructor: Bin Ma (DC 3345, http://www.cs.uwaterloo.ca/~binma)
More informationO C. 5 th C. 3 rd C. the national health museum
Elements of Molecular Biology Cells Cells is a basic unit of all living organisms. It stores all information to replicate itself Nucleus, chromosomes, genes, All living things are made of cells Prokaryote,
More informationEnhancing Access to the Bibliome: The TREC Genomics Track
MEDINFO 2004 M. Fieschi et al. (Eds) Amsterdam: IOS Press 2004 IMIA. All rights reserved Enhancing Access to the Bibliome: The TREC Genomics Track William Hersh, Ravi Teja Bhupatiraju, Sarah Corley Department
More informationThis software/database/presentation is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part
This software/database/presentation is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the author's official duties as a United States Government
More informationThis practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.
PRACTICAL 1: BLAST and Sequence Alignment The EBI and NCBI websites, two of the most widely used life science web portals are introduced along with some of the principal databases: the NCBI Protein database,
More informationPre-Lab Questions. 1. Use the following data to construct a cladogram of the major plant groups.
Pre-Lab Questions Name: 1. Use the following data to construct a cladogram of the major plant groups. Table 1: Characteristics of Major Plant Groups Organism Vascular Flowers Seeds Tissue Mosses 0 0 0
More information