EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
|
|
- Ernest Ramsey
- 5 years ago
- Views:
Transcription
1 EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science
2 Database What is database An organized set of data Can web pages, books, journal articles, tables, text files, and spreadsheet files be considered as databases? Molecular Biology Databases To disseminate biological data and information To provide biological data in computer-readable form To allow analysis of biological data 2012/9/11 EECS 730 2
3 Biological Information Nucleic acids: DNA sequence, genes, gene products (proteins), mutation, gene coding, distribution patterns, motifs Genomics: genome, gene structure and expression, genetic map, genetic disorder RNA sequence, secondary structure, 3D structure, interactions Proteins: Protein sequence, corresponding gene, secondary structure, 3D structure, function, motifs, homology, interactions Proteomics: expression profile, proteins in disease processes etc. Ligands and drugs (inhibitors, activators, substrates, metabolites) 2012/9/11 EECS 730 3
4 Biological Information Function: Binding sites, interactions, molecular action (binding, chemical reaction, etc.) Biological effect (signaling, transport, feedback, regulation, modification, etc.) Functional relationship, protein families, motifs, and homologs Pathways: Molecular networks, biological chain events, regulation, feedback, kinetic data 2012/9/11 EECS 730 4
5 Overview of molecular biology databases Sequence DNA Genbank ( EMBL (European Molecular Biology Laboratory, DDBJ (DNA Data Bank of Japan) Protein Swissprot ( NCBI Protein classification databases Prosite ( Pfam ( InterPro ( Gene ontology ( 2012/9/11 EECS 730 5
6 Overview of molecular biology databases Structure PDB (Protein Data Bank, X-ray crystallography, NMR, modeling KLOTHO (small molecules, Genome Mouse genome database ( Yeast genome ( Bacterial genomes ( Human genome browsers NCBI UCSC genome.ucsc.edu EBI Celera /9/11 EECS 730 6
7 Overview of molecular biology databases Genetic disorders OMIM (Online Mendelian Inheritance in Man, Taxonomy ( Literature PubMed ( 2012/9/11 EECS 730 7
8 Data about Databases 2012/9/11 EECS 730 8
9 Molecular biology databases Nucleic acids sequence Genome data Protein sequence Protein classification Protein structure 2012/9/11 EECS 730 9
10 Nucleic Acids databases What info are in these databases: DNA sequence, genes, gene products (proteins), mutation, gene coding, distribution patterns, motifs Genomics: genome, gene structure and expression, genetic map, genetic disorder RNA sequence, secondary structure, 3D structure, interactions 2012/9/11 EECS
11 Nucleic Acids databases DNA databases GenBank, EMBL, DDBJ 1. General purpose databases focusing on DNA sequences and their properties 2. GenBank, EMBL-bank and DDBJ exchange data to ensure comprehensive worldwide coverage and accession numbers are managed consistently between the three centers. 2012/9/11 EECS
12 Three major public DNA databases EMBL GenBank DDBJ 2012/9/11 EECS
13 International Nucleotide Sequence Database Collaboration 2012/9/11 EECS
14 EMBL nucleotide sequence database EMBL ( Contains nucleotide sequences collected from all public sources. Accessible through Sequence Retrieval System (SRS) which allows keyword searching Sequence similarity search tools: Blitz, Fasta, and BLAST (studied later) 2012/9/11 EECS
15 2012/9/11 EECS
16 EMBL Entry header ID entryname dataclass; molecule; division; sequence length (BP). 2012/9/11 EECS
17 EMBL Entry feature table Coding sequence 2012/9/11 EECS
18 EMBL Entry sequence 2012/9/11 EECS
19 EMBL format ID: IDentification AC: Accession numbers The primary means of identifying sequences providing a stable way of identifying entries from release to release. DE: description KW: Key Word information which can be used to generate cross-reference indexes of the sequence entries based on functional, structural, or other categories deemed important. OS: Organism Species OC: Organism Classification the taxonomic classification Of the source organism The OG (OrGanelle) linetype indicates the sub-cellular location of non-nuclear sequences. SQ: SeQuence header marks the beginning of the sequence data and Gives a summary of its content. The sequence data line has a line code consisting of two blanks. 2012/9/11 EECS
20 What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 NT_ Rs GenBank genomic DNA sequence Genomic contig dbsnp (single nucleotide polymorphism) DNA N An expressed sequence tag (1 of 170) NM_ RefSeq DNA sequence (from a transcript) RNA NP_ RefSeq protein AAC02945 GenBank protein Q28369 SwissProt protein 1KT7 Protein Data Bank structure record protein 2012/9/11 EECS
21 GenBank database GenBank ( Contains publicly available DNA sequences from more than 100,000 organisms. Also contains derived protein sequences, and annotations describing biological, structural, and other relevant features. Accessible through Entrez, NCBI s integrated retrieval system Sequence similarity search tools: BLAST (studied later) 2012/9/11 EECS
22 Number of base pairs in Genbank, present Base Pairs (billions) Base Pairs 1.E+11 1.E+10 1.E+09 1.E+08 Semilogarithmic plot 1.E+07 2-fold / 18 mo 10-fold / 5 yr 1.E Year Year These graphs provide one example of the rapidly accumulating data in biology, leading to entire new fields of study. 2012/9/11 EECS
23 >100,000 species are represented in GenBank all species 128,941 viruses 6,137 bacteria 31,262 archaea 2,100 eukaryota 87, /9/11 EECS
24 The most sequenced organisms in GeneBank Homo sapiens 10.7 billion bases Mus musculus 6.5b Rattus norvegicus 5.6b Danio rerio 1.7b Zea mays 1.4b Oryza sativa 0.8b Drosophila melanogaster 0.7b Gallus gallus 0.5b Arabidopsis thaliana 0.5b Updated GenBank release /9/11 EECS
25 A GenBank entry HEADER /9/11 EECS
26 GenBank entry - FEATURES 2012/9/11 EECS
27 GenBank entry - SEQUENCE 2012/9/11 EECS
28 Common sequence formats EMBL release format Genbank release format FASTA format : >X12345 Y098TR gene CGTATCTTACGAGCTACTACGA GGTCTTATCGGACGAGCGACT /9/11 EECS
29 FASTA format Fig Page /9/11 EECS
30 cdna cdna: DNA that is synthesized to be complementary to a mrna molecule. A cdna represents a portion of the DNA that specifies a protein (coding sequence of a gene). If the sequence of the cdna is known, the sequence of the DNA is known. Non-translated introns are not found in the cdna. (They are removed after the DNA is transcribed into mrna) DNA RNA protein complementary DNA (cdna) 2012/9/11 EECS
31 EST (Expressed Sequence Tag) Expressed Sequence Tags (ESTs) correspond to partial mrna sequences of expressed genes. They are sequences of cdna which have been reversetranscribed from mrna Short sequences (~ bases), each is result of single sequencing experiment -> high frequency of errors They represent a snapshot of what is expressed in a given tissue, and developmental stage. 2012/9/11 EECS
32 dbest (Expressed Sequence Tags database) dbest is a division of GenBank that contains sequence data and other information on cdna sequences, or ESTs, from a number of organisms. 2012/9/11 EECS
33 EST (Expressed Sequence Tag) Applications: Discovery of new genes Mapping of various genomes Identification of coding regions in genomic sequences. EST libraries are used to answer questions like: What genes in specific cell or tissue are expressed? 2012/9/11 EECS
34 One gene have multiple EST sequences! 2012/9/11 EECS
35 UniGene: Unique Genes UniGene partitions GenBank sequences into a nonredundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location. A majority of sequences are ESTs. 2012/9/11 EECS
36 Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1 This is a gene with 10 ESTs associated; the cluster size is /9/11 EECS
37 Cluster sizes in UniGene (human) Cluster size Number of clusters 1 8, , , , , , , , ,000-30, /9/11 EECS UniGene build 172, 8/04
38 UniGene: unique genes via ESTs Conclusion: UniGene is a useful tool to look up information about expressed genes. UniGene displays information about the abundance of a transcript (expressed gene), as well as its regional distribution of expression (e.g. brain vs. liver). We will discuss UniGene further on in the section of gene expression. 2012/9/11 EECS Page 31
39 Using a database How to get information out of a database: Browsing: no targeted information to retrieve Search: looking for particular information Searching a database: Must have a key that identifies the element(s) of the database that are of interest. Access number Name of gene Sequence of gene Keyword (any word that occurs somewhere in the database records) Other information 2012/9/11 EECS
40 NCBI and Entrez One of the most useful and comprehensive sources of databases is the NCBI, part of the National Library of Medicine. NCBI provides interesting summaries, browsers for genome data, and search tools Entrez is their database search interface Can search on gene names, sequences, chromosomal location, diseases, keywords /9/11 EECS
41 National Center for Biotechnology Information (NCBI) /9/11 EECS
42 Entrez integrates the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes 2012/9/11 EECS
43 Entrez is a search and retrieval system that integrates NCBI databases 2012/9/11 EECS
44 2012/9/11 EECS
45 Example of how to access sequence data: HIV-1 pol There are many possible approaches. Begin at the main page of NCBI, and type an Entrez query: hiv-1 pol 2012/9/11 EECS
46 2012/9/11 EECS
47 Searching for HIV-1 pol: Following the genome link yields a manageable three results 2012/9/11 EECS Page 34
48 Example of how to access sequence data: HIV-1 pol For the Entrez query: hiv-1 pol there are about 40,000 nucleotide or protein records (and >100,000 records for a search for hiv-1 ), but these can easily be reduced in two easy steps: --specify the organism, e.g. hiv-1[organism] --limit the output to RefSeq! 2012/9/11 EECS
49 over 100,000 nucleotide entries for HIV-1 only 1 RefSeq 2012/9/11 EECS
50 NCBI s important RefSeq project: best representative sequences The RefSeq collection aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products, for major research organisms. It provides an expertly curated accession number that corresponds to the most stable, agreed-upon reference version of a sequence. RefSeq identifiers include the following formats: Complete genome NC_###### Complete chromosome NC_###### Genomic contig NT_###### mrna (DNA format) NM_###### e.g. NM_ Protein NP_###### e.g. NP_ /9/11 EECS
51 Strategy for assessment of alternative multiple sequence alignment algorithms 1. Create or obtain a database of protein sequences for which the 3D structure is known. Thus we can define true homologs using structural criteria. BaliBase: a reference alignment resource with over 1,000 sequences in 142 alignments Try making multiple sequence alignments with many different sets of proteins (very related, very distant, few gaps, many gaps, insertions, outliers). 3. Compare the answers. 2012/9/11 EECS
52 Acknowledge Many of the images and slides in this PowerPoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN ). Copyright 2003 by John Wiley & Sons, Inc. 2012/9/11 EECS
Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics 260.602.01 September 1, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Teaching assistants Hugh Cahill (hugh@jhu.edu) Jennifer Turney (jturney@jhsph.edu) Meg Zupancic
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationBLASTing through the kingdom of life
Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the database of nucleotide sequences at the National Center for Biotechnology
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationBLASTing through the kingdom of life
Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the main database of nucleotide sequences at the National Center for Biotechnology
More informationCompiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology
Bioinformatics Model Answers Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology Page 1 of 15 Previous years questions asked. 1. Describe the software used in bioinformatics 2. Name four
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationEntrez Gene: gene-centered information at NCBI
D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationO C. 5 th C. 3 rd C. the national health museum
Elements of Molecular Biology Cells Cells is a basic unit of all living organisms. It stores all information to replicate itself Nucleus, chromosomes, genes, All living things are made of cells Prokaryote,
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationIntroduction to Bioinformatics for Medical Research. Gideon Greenspan TA: Oleg Rokhlenko. Lecture 1
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il TA: Oleg Rokhlenko Lecture 1 Introduction to Bioinformatics Introduction to Bioinformatics What is Bioinformatics?
More informationBLASTing through the kingdom of life
Information for students Instructions: In short, you will copy one of the sequences from the data set, use blastn to identify it, and use the information from your search to answer the questions below.
More informationWorksheet for Bioinformatics
Worksheet for Bioinformatics ACTIVITY: Learn to use biological databases and sequence analysis tools Exercise 1 Biological Databases Objective: To use public biological databases to search for latest research
More informationGenome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007
Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html
More informationBioinformatics overview
Bioinformatics overview Aplicações biomédicas em plataformas computacionais de alto desempenho Aplicaciones biomédicas sobre plataformas gráficas de altas prestaciones Biomedical applications in High performance
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationIntroduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute
Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics Institute A brief outline of this course What is gene expression, why it s important Microarrays and how
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7
More information3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome
Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationBig picture and history
Big picture and history (and Computational Biology) CS-5700 / BIO-5323 Outline 1 2 3 4 Outline 1 2 3 4 First to be databased were proteins The development of protein- s (Sanger and Tuppy 1951) led to the
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationIntegration of data management and analysis for genome research
Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationNCBI & Other Genome Databases. BME 110/BIOL 181 CompBio Tools
NCBI & Other Genome Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2011 Admin Reading Dummies Ch 3 Assigned Review: "The impact of next-generation sequencing technology on genetics" by E.
More informationOverview of Health Informatics. ITI BMI-Dept
Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org
More informationApplied Bioinformatics
Applied Bioinformatics Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Course overview What is bioinformatics Data driven science: the creation and advancement
More informationComputers in Biology and Bioinformatics
Computers in Biology and Bioinformatics 1 Biology biology is roughly defined as "the study of life" it is concerned with the characteristics and behaviors of organisms, how species and individuals come
More informationBIO 152 Principles of Biology III: Molecules & Cells Acquiring information from NCBI (PubMed/Bookshelf/OMIM)
BIO 152 Principles of Biology III: Molecules & Cells Acquiring information from NCBI (PubMed/Bookshelf/OMIM) Note: This material is adapted from Web-based Bioinformatics Tutorials: Exploring Genomes by
More informationearray 5.0 Create your own Custom Microarray Design
earray 5.0 Create your own Custom Microarray Design http://earray.chem.agilent.com earray 5.x Overview Session Summary Session Summary Agilent Genomics Microarray Solution earray Functional Overview Gene
More informationBioinformatics to chemistry to therapy: Some case studies deriving information from the literature
Bioinformatics to chemistry to therapy: Some case studies deriving information from the literature. Donald Walter August 22, 2007 The Typical Drug Development Paradigm Gary Thomas, Medicinal Chemistry:
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not
More informationDigital information cycle. Database. Database. BINF 630: Bioinformatics Methods
Digital information cycle BINF 630: Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu Creation and capture Storage and management Rights management Search and access Distribution Electronic
More informationCSC 121 Computers and Scientific Thinking
CSC 121 Computers and Scientific Thinking Fall 2005 Computers in Biology and Bioinformatics 1 Biology biology is roughly defined as "the study of life" it is concerned with the characteristics and behaviors
More informationEngineering Genetic Circuits
Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com
More informationFACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE
FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Outline Central Dogma of Molecular
More informationComputational gene finding. Devika Subramanian Comp 470
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) The biological context Lec 1 Lec 2 Lec 3 Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationHands-On Four Investigating Inherited Diseases
Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationAaditya Khatri. Abstract
Abstract In this project, Chimp-chunk 2-7 was annotated. Chimp-chunk 2-7 is an 80 kb region on chromosome 5 of the chimpanzee genome. Analysis with the Mapviewer function using the NCBI non-redundant database
More informationSAMPLE LITERATURE Please refer to included weblink for correct version.
Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationPrimePCR Assay Validation Report
Gene Information Gene Name Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin)
More informationTraining materials.
Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation
More informationPrimePCR Assay Validation Report
Gene Information Gene Name collagen, type IV, alpha 1 Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID COL4A1 Human This gene encodes the major type IV alpha
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationPrimePCR Assay Validation Report
Gene Information Gene Name SRY (sex determining region Y)-box 6 Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID SOX6 Human This gene encodes a member of the
More informationPrimePCR Assay Validation Report
Gene Information Gene Name transforming growth factor, beta 1 Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID TGFB1 Human This gene encodes a member of the
More informationCHEM 436 / 630. Molecular modelling of proteins. Winter 2018 Term. Instructor: Guillaume Lamoureux Concordia University, Montréal, Canada
CHEM 436 / 630 Molecular modelling of proteins Winter 2018 Term Instructor: Guillaume Lamoureux Concordia University, Montréal, Canada Syllabus: http://faculty.concordia.ca/glamoure/pdfs/chem436_630_syllabus_2018.pdf
More informationGenomic and bioinformatics resources
Genomic and bioinformatics resources 徐唯哲 Paul Wei-Che HSU Assistant Research Specialist Bioinformatics Core, Institute of Molecular Biology, Academia Sinica, Taiwan, R.O.C. 1 What Bioinformatics Can Do
More information7 Gene Isolation and Analysis of Multiple
Genetic Techniques for Biological Research Corinne A. Michels Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-89921-6 (Hardback); 0-470-84662-3 (Electronic) 7 Gene Isolation and Analysis of Multiple
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics IMBB 2017 RAB, Kigali - Rwanda May 02 13, 2017 Joyce Nzioki Plan for the Week Introduction to Bioinformatics Raw sanger sequence data Introduction to CLC Bio Quality Control
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationChapter 15 Gene Technologies and Human Applications
Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect
More informationSequence Analysis Lab Protocol
Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136
More informationAgenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence
Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationDNA is normally found in pairs, held together by hydrogen bonds between the bases
Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,
More informationLast Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background
More informationSequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases
Chapter 2 Paul Rangel Abstract DNA and Protein sequence databases are the cornerstone of bioinformatics research. DNA databases such as GenBank and EMBL accept genome data from sequencing projects around
More informationBIOINFORMATICS Introduction
BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea
More informationIntroduction to Bioinformatics Part 1 of 2
Introduction to Bioinformatics Part 1 of 2 CS 91.510 January 28, 2004 Georges Grinstein, Ph.D. grinstein@cs.uml.edu Copyright notice Many of the images in this PowerPoint presentation are from Bioinformatics
More informationOntologies - Useful tools in Life Sciences and Forensics
Ontologies - Useful tools in Life Sciences and Forensics How today's Life Science Technologies can shape the Crime Sciences of tomorrow 04.07.2015 Dirk Labudde Mittweida Mittweida 2 Watson vs Watson Dr.
More informationGREG GIBSON SPENCER V. MUSE
A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.
More informationIdentification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources
Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Navreet Kaur M.Tech Student Department of Computer Engineering. University College of Engineering, Punjabi
More informationFunction Prediction of Proteins from their Sequences with BAR 3.0
Open Access Annals of Proteomics and Bioinformatics Short Communication Function Prediction of Proteins from their Sequences with BAR 3.0 Giuseppe Profiti 1,2, Pier Luigi Martelli 2 and Rita Casadio 2
More informationGenomic region (ENCODE) Gene definitions
DNA From genes to proteins Bioinformatics Methods RNA PROMOTER ELEMENTS TRANSCRIPTION Iosif Vaisman mrna SPLICE SITES SPLICING Email: ivaisman@gmu.edu START CODON STOP CODON TRANSLATION PROTEIN From genes
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationJones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Bioinformatics and Genomic Data: Investigating a Complex Genetic Disease
ones FOR Photodisc/Getty Sashkin/ShutterStock, Images.Inc. rnin STR 1 Chapter Bioinformatics and Genomic Data: Investigating a Complex Genetic Disease nes FOR Chapter Overview This is a skill-development
More informationPrimePCR Assay Validation Report
Gene Information Gene Name laminin, beta 3 Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID LAMB3 Human The product encoded by this gene is a laminin that
More informationLecture #1. Introduction to microarray technology
Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing
More informationFiles for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web
More informationONLINE BIOINFORMATICS RESOURCES
Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower
More informationIntroduction to Molecular Biology Databases
Introduction to Molecular Biology Databases Laboratorio de Bioinformática Centro de Astrobiología INTA-CSIC Centro de Astrobiología PRESENT BIOLOGY RESEARCH Data sources Genome sequencing projects: genome
More informationRegulation of eukaryotic transcription:
Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:
More information2. Outline the levels of DNA packing in the eukaryotic nucleus below next to the diagram provided.
AP Biology Reading Packet 6- Molecular Genetics Part 2 Name Chapter 19: Eukaryotic Genomes 1. Define the following terms: a. Euchromatin b. Heterochromatin c. Nucleosome 2. Outline the levels of DNA packing
More informationChapter 5. explain how information is submitted to and processed by biological databases.
Introduction to Databases The computer belongs on the benchtop in the modern biology lab, along with other essential equipment. A network of online databases provides researchers with quick access to information
More information