What I hope you ll learn. Introduction to NCBI & Ensembl tools including BLAST and database searching!
|
|
- Cora Haynes
- 5 years ago
- Views:
Transcription
1 What I hope you ll learn Introduction to NCBI & Ensembl tools including BLAST and database searching What do we learn from database searching and sequence alignments What tools are available at NCBI What tools are available through Ensembl Incorporating Bioinformatics into the High School Biology Curriculum Fran Lewitter, Ph.D. Director, Bioinformatics & Research Computing Whitehead Institute for Biomedical Research Lewitter AT wi.mit.edu Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15, First some background info Lewitter-Whitehead Institute August 15,
2 Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA, Antoniades HN. Science 221: , Why do alignments Use sequence similarity to infer homology and/or structural similarity between 2 or more genes/ proteins Identify more conserved regions of a protein, potentially identifying regions of most functional importance Compare and contrast homologs (perhaps into groups) based on shared positions or regions Infer evolutionary distance from sequence dissimilarity Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15, Evolutionary Basis of Sequence Alignment Similarity - observable quantity, such as percent identity Homology - conclusion drawn from data that two genes share a common evolutionary history; no metric is associated with this Paralog genes related by duplication Ortholog genes related by speciation Lewitter-Whitehead Institute August 15, 2012 Local vs Global Alignment LOCAL GLOBAL 7 Lewitter-Whitehead Institute August 15, 2012 From Mount, Bioinformatics, 2004, pg 71 8
3 Nucleotide vs Protein If comparing protein coding genes, use protein sequences because of less noise If protein sequences are very similar, it might be more instructive to use DNA sequences Example of simple scoring system for nucleic acids Match = +1 (ex. A-A, T-T, C-C, G-G) Mismatch = -1 (ex. A-T, A-C, etc) Gap opening = - 5 Gap extension = -2 T C A G A C G A G T G T C G G A - - G C T G = -4 Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15, Possible Alignments A:T C A G A C G A G T G B:T C G G A G C T G I. T C A G A C G A G T G -4 T C G G A - - G C T G II. T C A G A C G A G T G -5 T C G G A - G C - T G III.T C A G A C G A G T G -5 T C G G A - G - C T G Scoring for Protein Alignments - Amino Acid Substitution Matrices PAM - point accepted mutation based on global alignment [evolutionary model] BLOSUM - block substitutions based on local alignments [similarity among conserved sequences] Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15,
4 AA Scoring Matrices Substitution Matrices PAM - point accepted mutation based on global alignment [evolutionary model] Log-odds= pair in homologous proteins pair in unrelated proteins by chance BLOSUM - block substitutions based on local alignments [similarity among conserved sequences] Log-odds= obs freq of aa substitutions freq expected by chance BLOSUM 30 BLOSUM 62 BLOSUM 80 % identity Increasing similarity PAM 250 (80) PAM 120 (66) PAM 90 (50) % change Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15, Scoring for BLAST Alignments Score = 94.0 bits (230), Expect = 6e-19 Identities = 45/101 (44%), Positives = 54/101 (52%), Gaps = 7/101 (6%) Query: 204 YTGPFCDV----DTKASCYDGRGLSYRGLARTTLSGAPCQPWASEATYRNVTAEQ---AR 256 Y+ FC + + CY G G +YRG T SGA C PW S V Q A+ Sbjct: 198 YSSEFCSTPACSEGNSDCYFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQ 257 Query: 257 NWGLGGHAFCRNPDNDIRPWCFVLNRDRLSWEYCDLAQCQT 297 GLG H +CRNPD D +PWC VL RL+WEYCD+ C T Sbjct: 258 ALGLGKHNYCRNPDGDAKPWCHVLKNRRLTWEYCDVPSCST 298 Based on BLOSUM62 Position 1: Y - Y = 7 Position 2: T - S = 1 Position 3: G - S = 0 Position 4: P - E = Position 9: - - P = -11 Position 10: - - A = Sum 230 Statistical Significance Raw Scores - score of an alignment equal to the sum of substitution and gap scores. Bit scores - scaled version of an alignment s raw score that accounts for the statistical properties of the scoring system used. E-value - expected number of distinct alignments that would achieve a given score by chance. Lower E-value => more significant. Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15,
5 A formula E = Kmn e -λs This is the Expected number of high-scoring segment pairs (HSPs) with score at least S for sequences of length m and n. This is the E value for the score S. The parameters K and λ can be thought of simply as natural scales for the search space size and the scoring system respectively. What s significant? High confidence - >40% identity for long alignments (Rost, 1999 found that sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity >40%) Twilight zone blurry % identity Midnight zone - <20% identity Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15, Tools of Interest NCBI Home Page NCBI (US) - BLAST Entrez EBI/EMBL/WTSI (Europe) Ensembl ( Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15,
6 Ensembl Home Page Hands-On Entrez finding sequences BLAST look for similar sequences Ensembl finding sequences Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15, How is BLAST WIBR? Looking for homologs Identifying domains in proteins Peptide identification Genome annotation Remember 1. Don t be afraid to look at help pages for each website Click, click, click 2. Any analysis tool will give you results 3. Interpret results you find Lewitter-Whitehead Institute August 15, Lewitter-Whitehead Institute August 15,
Data Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationBLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationBIOINFORMATICS IN BIOCHEMISTRY
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses on the analysis of molecular sequences (DNA, RNA, and
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationFUNCTIONAL BIOINFORMATICS
Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationComparative Bioinformatics. BSCI348S Fall 2003 Midterm 1
BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to
More informationOutline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases
Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing
More informationCreation of a PAM matrix
Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationBioinformatics for Biologists. Comparative Protein Analysis
Bioinformatics for Biologists Comparative Protein nalysis: Part I. Phylogenetic Trees and Multiple Sequence lignments Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research
More informationWhy study sequence similarity?
Sequence Similarity Why study sequence similarity? Possible indication of common ancestry Similarity of structure implies similar biological function even among apparently distant organisms Example context:
More informationSequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing
Sequence Analysis II: Sequence Patterns and Matrices George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Patterns and Matrices Multiple sequence alignments Sequence patterns Sequence
More informationEvolutionary Genetics. LV Lecture with exercises 6KP
Evolutionary Genetics LV 25600-01 Lecture with exercises 6KP HS2017 >What_is_it? AATGATACGGCGACCACCGAGATCTACACNNNTC GTCGGCAGCGTC 2 NCBI MegaBlast search (09/14) 3 NCBI MegaBlast search (09/14) 4 Submitted
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationScoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein
Scoring Alignments Genome 373 Genomic Informatics Elhanan Borenstein A quick review Course logistics Genomes (so many genomes) The computational bottleneck Python: Programs, input and output Number and
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools
CAP 5510: Introduction to Bioinformatics : Bioinformatics Tools ECS 254A / EC 2474; Phone x3748; Email: giri@cis.fiu.edu My Homepage: http://www.cs.fiu.edu/~giri http://www.cs.fiu.edu/~giri/teach/bioinfs15.html
More informationG4120: Introduction to Computational Biology
G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Lecture 3 February 13, 2003 Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics
More informationThe String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.
Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif
More informationMaking Sense of DNA and Protein Sequences. Lily Wang, PhD Department of Biostatistics Vanderbilt University
Making Sense of DNA and Protein Sequences Lily Wang, PhD Department of Biostatistics Vanderbilt University 1 Outline Biological background Major biological sequence databanks Basic concepts in sequence
More informationMatch the Hash Scores
Sort the hash scores of the database sequence February 22, 2001 1 Match the Hash Scores February 22, 2001 2 Lookup method for finding an alignment position 1 2 3 4 5 6 7 8 9 10 11 protein 1 n c s p t a.....
More informationCAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU
CAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU !2 Sequence Alignment! Global: Needleman-Wunsch-Sellers (1970).! Local: Smith-Waterman (1981) Useful when commonality
More informationCAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU
CAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU Three major public DNA databases!2! GenBank NCBI (Natl Center for Biotechnology Information) www.ncbi.nlm.nih.gov! EMBL
More informationBig picture and history
Big picture and history (and Computational Biology) CS-5700 / BIO-5323 Outline 1 2 3 4 Outline 1 2 3 4 First to be databased were proteins The development of protein- s (Sanger and Tuppy 1951) led to the
More informationENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics
A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these
More informationDatabase Searching and BLAST Dannie Durand
Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is
More informationTypically, to be biologically related means to share a common ancestor. In biology, we call this homologous
Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural
More informationBasic Local Alignment Search Tool
14.06.2010 Table of contents 1 History History 2 global local 3 Score functions Score matrices 4 5 Comparison to FASTA References of BLAST History the program was designed by Stephen W. Altschul, Warren
More informationQuestion 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.
Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or
More informationBioinformatic analysis of similarity to allergens. Mgr. Jan Pačes, Ph.D. Institute of Molecular Genetics, Academy of Sciences, CR
Bioinformatic analysis of similarity to allergens Mgr. Jan Pačes, Ph.D. Institute of Molecular Genetics, Academy of Sciences, CR Scope of the work Method for allergenicity search used by FAO/WHO Analysis
More informationBioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute
Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationBioinformatics for proteomics
Purdue-UAB Botanicals Center for Age-Related Disease Bioinformatics for proteomics Stephen Barnes, PhD Purdue-UAB Botanical Center Workshop 2002 Mass Spectrometry Methods in Botanicals Research Objectives
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationBioinformatics Tools for RNA Data Analysis. Joseph Santos Bloomfield Tech High School Bloomfield, New Jersey
Bioinformatics Tools for RNA Data Analysis Joseph Santos Bloomfield Tech High School Bloomfield, New Jersey 1 Contents What is Bioinformatics? Vocabulary (with metaphors) 1D-- --Sequence (BLAST) 2D-- --Secondary
More informationBIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP
Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map
More informationGenomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010
Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010 Genomics is a new and expanding field with an increasing impact
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationMOL204 Exam Fall 2015
MOL204 Exam Fall 2015 Exercise 1 15 pts 1. 1A. Define primary and secondary bioinformatical databases and mention two examples of primary bioinformatical databases and one example of a secondary bioinformatical
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationUNIVERSITY OF KWAZULU-NATAL EXAMINATIONS: MAIN, SUBJECT, COURSE AND CODE: GENE 320: Bioinformatics
UNIVERSITY OF KWAZULU-NATAL EXAMINATIONS: MAIN, 2010 SUBJECT, COURSE AND CODE: GENE 320: Bioinformatics DURATION: 3 HOURS TOTAL MARKS: 125 Internal Examiner: Dr. Ché Pillay External Examiner: Prof. Nicola
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationBME 110 Midterm Examination
BME 110 Midterm Examination May 10, 2011 Name: (please print) Directions: Please circle one answer for each question, unless the question specifies "circle all correct answers". You can use any resource
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationFiles for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web
More informationPRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5
Molecular Biology-2017 1 PRESENTING SEQUENCES As you know, sequences may either be double stranded or single stranded and have a polarity described as 5 and 3. The 5 end always contains a free phosphate
More informationGrundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationExercise I, Sequence Analysis
Exercise I, Sequence Analysis atgcacttgagcagggaagaaatccacaaggactcaccagtctcctggtctgcagagaagacagaatcaacatgagcacagcaggaaaa gtaatcaaatgcaaagcagctgtgctatgggagttaaagaaacccttttccattgaggaggtggaggttgcacctcctaaggcccatgaagt
More informationHot Topics. What s New with BLAST?
Hot Topics What s New with BLAST? Slides based on NCBI talk at American Society of Human Genetics October 2005 Hot Topics Outline I. New BLAST Algorithm: Discontiguous MegaBLAST II. New Databases III.
More informationBIOINFORMATICS TO ANALYZE AND COMPARE GENOMES
BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES We sequenced and assembled a genome, but this is only a long stretch of ATCG What should we do now? 1. find genes What are the starting and end points for
More informationChristian Sigrist. January 27 SIB Protein Bioinformatics course 2016 Basel 1
Christian Sigrist January 27 SIB Protein Bioinformatics course 2016 Basel 1 General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific
More informationThe human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.
Data mining in Ensembl with BioMart Worked Example The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Which other genes related to human
More informationAnnotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station
Annotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station 1 Library preparation Sequencing Hypothesis testing Bioinformatics 2 Why annotate?
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationBIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology
BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology Jeremy Buhler March 15, 2004 In this lab, we ll annotate an interesting piece of the D. melanogaster genome. Along the way, you ll get
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationBLAST Basics. ... Elements of Bioinformatics Spring, Tom Carter. tom/
BLAST Basics...... Elements of Bioinformatics Spring, 2003 Tom Carter http://astarte.csustan.edu/ tom/ March, 2003 1 Sequence Comparison One of the fundamental tasks we would like to do in bioinformatics
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CP 551: Introduction to Bioinformatics iri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs8.html 1/29/8 CP551 1 enomic Databases Entrez Portal at National Center for
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationSequence Alignments. Week 3
Sequence Alignments Week 3 Independent Project Gene Due: 9/25 (Monday--must be submitted by email) Rough Draft Due: 11/13 (hard copy due at the beginning of class, and emailed to me) Final Version Due:
More informationIntegrated UG/PG Biotechnology (Fifth Semester) End Semester Examination, 2013 LBTC 504: Bioinformatics. Model Answers
Integrated UG/PG Biotechnology (Fifth Semester) End Semester Examination, 2013 LBTC 504: Bioinformatics Model Answers Answer Key for Section-A (Objective Type Questions) Question 1. (i) (c) Both of the
More informationSingle alignment: FASTA. 17 march 2017
Single alignment: FASTA 17 march 2017 FASTA is a DNA and protein sequence alignment software package first described (as FASTP) by David J. Lipman and William R. Pearson in 1985.[1] FASTA is pronounced
More informationBioinformatics for Biologists
Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data
More informationCommunity-assisted genome annotation: The Pseudomonas example. Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada
Community-assisted genome annotation: The Pseudomonas example Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada Overview Pseudomonas Community Annotation Project (PseudoCAP) Past
More information03-511/711 Computational Genomics and Molecular Biology, Fall
03-511/711 Computational Genomics and Molecular Biology, Fall 2010 1 Study questions These study problems are intended to help you to review for the final exam. This is not an exhaustive list of the topics
More informationIdentifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M.
Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M. Brent Prerequisites: A Simple Introduction to NCBI BLAST Resources: The GENSCAN
More informationIntroduction to Bioinformatics Problem Set 7: Genome Analysis
Introduction to Bioinformatics Problem Set 7: Genome Analysis 1. You are examining the sequence of a newly sequenced phage and run across the following features. In each case, you're wondering whether
More information03-511/711 Computational Genomics and Molecular Biology, Fall
03-511/711 Computational Genomics and Molecular Biology, Fall 2011 1 Study questions These study problems are intended to help you to review for the final exam. This is not an exhaustive list of the topics
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationDeleterious mutations
Deleterious mutations Mutation is the basic evolutionary factor which generates new versions of sequences. Some versions (e.g. those concerning genes, lets call them here alleles) can be advantageous,
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationGenomics I. Organization of the Genome
Genomics I Organization of the Genome Outline Organization of genome Genomes, chromosomes, genes, exons, introns, promoters, enhancers, etc. Databases Why do we need them? How do we access them? What can
More informationA History of Bioinformatics: Development of in silico Approaches to Evaluate Food Proteins
A History of Bioinformatics: Development of in silico Approaches to Evaluate Food Proteins /////////// Andre Silvanovich Ph. D. Bayer Crop Sciences Chesterfield, MO October 2018 Bioinformatic Evaluation
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationTheory and Application of Multiple Sequence Alignments
Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)
More informationFunctional profiling of metagenomic short reads: How complex are complex microbial communities?
Functional profiling of metagenomic short reads: How complex are complex microbial communities? Rohita Sinha Senior Scientist (Bioinformatics), Viracor-Eurofins, Lee s summit, MO Understanding reality,
More informationOptimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity.
Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity of Alignment Dr.D.Chandrakala 1, Dr.T.Sathish Kumar 2, S.Preethi 3, D.Sowmya
More informationAny questions? d N /d S ratio (ω) estimated across all sites is inefficient at detecting positive selection
Any questions? Based upon this BlastP result, is the query used homologous to lysin? Blast Phylogenies Likelihood Ratio Tests The current size of the NR protein database is 680,984,053 and the PDB is 3,816,875.
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationGood Science. Benefit to Society
Good Science Benefit to Society Human genes claimed in granted U.S. patents Jensen and Murray, Science 310:239-240 (14 Oct. 2005) Specifically, this map is based on a BLAST homology search linking nucleotide
More informationBiotechnology Project Lab
Only for teaching purposes - not for reproduction or sale Advanced Cell Biology & Biotechnology Biotechnology Project Lab Giovanna Gambarotta COMPETENCES THAT YOU WILL ACQUIRE - compare DNA sequences -
More informationIntroduction to Cellular Biology and Bioinformatics. Farzaneh Salari
Introduction to Cellular Biology and Bioinformatics Farzaneh Salari Outline Bioinformatics Cellular Biology A Bioinformatics Problem What is bioinformatics? Computer Science Statistics Bioinformatics Mathematics...
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationWeb-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.
Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September
More informationBLASTing through the kingdom of life
Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the main database of nucleotide sequences at the National Center for Biotechnology
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationCS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes
CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual
More informationGerm-line vs somatic-variation theories
BME 128 Tuesday April 26 (1) Filling in the gaps Antibody diversity, how is it achieved? - by specialised (!) mechanisms Chp6 (Protein Diversity & Sequence Analysis) - more about the main concepts in this
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationLab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009)
Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009) Prerequisites: BLAST Exercise: An In-Depth Introduction to NCBI BLAST Familiarity
More informationCS313 Exercise 1 Cover Page Fall 2017
CS313 Exercise 1 Cover Page Fall 2017 Due by the start of class on Monday, September 18, 2017. Name(s): In the TIME column, please estimate the time you spent on the parts of this exercise. Please try
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More information