Making Sense of DNA and Protein Sequences. Lily Wang, PhD Department of Biostatistics Vanderbilt University
|
|
- Phillip Elliott
- 6 years ago
- Views:
Transcription
1 Making Sense of DNA and Protein Sequences Lily Wang, PhD Department of Biostatistics Vanderbilt University 1
2 Outline Biological background Major biological sequence databanks Basic concepts in sequence comparison methods Substitution matrices and gap penalties Methods for aligning two sequences Statistical distributions of alignment scores Software - BLAST 2
3 DNA The genetic material that is physically transmitted from parent to offspring Double stranded helix Within a single chain, nucleotide base+ sugar+ phosphate group A,G,C,T refers to the base A paired with T, G paired with C 3
4 Proteins The ultimate cellular activities are influenced through DNA encoded proteins 4
5 5 The Central Dogma of Molecular Biology
6 Splicing, Introns and Exons 6
7 The Growth of Biological Data 7
8 8 Goals of Functional Genomics Understand the functions of genes and their interplay with proteins and the environment to create complex, dynamic living systems
9 Sequence Databases - Primary Nucleotide Databases Contains sequences derived from sequencing a biological molecule that exist in a test tube, somewhere in a lab. They do not represent sequences that are a consensus of a population. GenBank, DDBJ, EMBL 9
10 Sequence Databases An Example Record 10
11 11 Secondary Databases RefSeq These are curated databases Many sequences are represented more than once in GenBank, this leads to huge degrees of redundancy. Goal: provide a reference for each molecule in the central dogma (DNA, mrna, and protein). RefSeq accession numbers format: 2+6 format Experimentally determined sequence data: NT_ Genomic contigs (DNA) NM_ mrnas NP_ Proteins computational predictions from raw DNA sequences XM_ Model mrna XP_ Model proteins
12 Protein Sequences UniProt Knowledgebase ( SWISSPROT - manually annotated records and curator evaluated computational analysis TrEMBL - computationally analyzed records awaiting for manual annotation Source: translation of all coding sequences (CDS) found in DDBJ/EMBL/GenBank, PDB entries, sequences submitted directly to UniProt, and PIR-PSD Non-redundancy - describe in a single record, all protein products derived from a certain gene Extensive cross reference - to GenBank, 2D-PAGE data, protein structure databases, protein domain and family characterization databases, species-specific data collections, and disease databases. 12
13 Amino Acid Codes A sample record: 13
14 Sequence Comparisons It s much easier to determine sequences of genes and proteins than to determine their structure or function New sequences are adapted from pre-existing sequences rather than invented anew. The more conserved amino acids in similar proteins from different species are ones that play an essential role in structure and function. Significant similarities between sequences often give important clues on phylogeny, structure and functions. 14
15 Origins of Genes Having a Similar Sequence 15
16 16
17 17 Sequence Comparisons Human DNA has about 30,000 genes, > 50% of their functions are still unknown Many of human proteins share similarities with other organisms By experimentation and by comparing genes and proteins with those already known in the databases, our goal is to determine functions of newly discovered genes and proteins
18 Decide on a scoring system Pairwise Comparisons Align a given set of sequences to find the best matching region(s) Assign a score for the comparison between sequences We wish to evaluate the statistical significance of the score. 18
19 19 Substitution Matrices A substitution matrix or score matrix is a matrix where is in position i, j of the matrix. s( a, a ) i j BLOSUM 62 Matrix
20 Not all amino acids are equal Substitution Matrices Some are more easily substituted than others Some mutations occur more often Some substitutions are kept more often Mutations tend to favor some substitutions Some amino acids have similar codons (for example TTT & TTC for Phe, TTA & TTG for Leu) They are more likely to be changed from DNA mutation Selection tends to favor some substitutions Some amino acids have similar properties/structure They are more likely to be kept 20
21 Log Odds Score Given a pair of aligned sequences, we want to assign a score to the alignment that gives a measure of the relative likelihood that the sequences are related as opposed to being unrelated. Consider the LRS for two sequences x and y, pxy i i Pxy (, match model) p i xiy = = Pxy (, random model) q q qq x y i x y i i i i i i p ab = s( a, b) = log The log-odds ratios s( xi, yi) where i qq a b is the log likelihood ratio of the residue pair (a,b) occurring as an aligned pair, as opposed to an unaligned pair. i 21
22 Scoring An alignment Score of an alignment is the sum of the scores of all pairs of residues in the alignment sequence 1: TCCPSIVARSN sequence 2: SCCPSISARNT => alignment score = 46 Maximal Segment Pair (MSP) - Given two protein sequences, the pair of equal length segments that, when aligned, have the greatest aggregate score is called the Maximal Segment Pair (MSP). An MSP may be of any length; its score is the MSP score. 22
23 Theories on Substitution Matrices Among the MSPs from the comparison of random sequences, the amino acids a i and a j are aligned with target frequency S ij (Altschul, 1991) q = p p e λ ij i j Now among alignments representing distant homologies, the amino acids are paired with certain characteristic frequencies. Only if these correspond to a matrix s target frequencies, it has been argued, can be matrix be optimal for distinguishing distant local homologies from similarities due to chance. (Karlin & Altschul, 1990) Any substitution matrix is implicitly a log-odds matrix, with a specific target distribution for aligned pairs of amino acid residues. 23
24 Substitution Matrices The PAM family PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1. The BLOSUM family BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with no less than 62% divergence. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins. Though BLOSUM 62 is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. 24
25 25 Selecting Optimal Matrices (Altschul, 1991; Henikoff & Henikoff 1993; Wheeler, 2003) Matrix Best Use Similarity (%) PAM40 Short Alignments that are highly similar PAM160 Detecting members of a protein family PAM250 Longer alignments of more divergent sequences ~30 BLOSUM90 Short alignments that are highly similar BLOSUM80 Detecting members of a protein family BLOSUM62 Most effective in finding all potential similarities BLOSUM30 Longer alignments of more divergent sequences <30
26 26 Global vs. Local Alignment Global Alignment aligns the entire sequences, use all characters up to both ends of each sequence Local Alignment aligns only the best matching parts of the sequences that gives the highest matching scores rationale: distantly related proteins may share only isolated regions of similarity
27 27 Dot Matrix Methods for Pairwise Alignment reveals the presence of insertions / deletions, and direct/inverted repeats; should be considered first choice Dynamic Programming guarantees the optimal alignment, but can be slow k-tuple heuristic method does not guarantee optimal alignment, but is fast; implemented in BLAST and FASTA
28 Dot Matrix Method 28
29 Heuristic Method Basic Local Alignment Search Tool (BLAST) Question: What database sequences are most similar to (or contain the most similar regions to) my previously uncharacterised sequence? BLAST finds the highest scoring locally optimal alignments between a query sequence and a database. Very fast algorithm, but does not guarantee optimal alignment Can be used to search extremely large databases Sufficiently sensitive and selective for most purposes 29
30 30 BLAST For a given word length w (usually 3 for proteins) and a given score matrix: Create a list of all words (w-mers) that can can score >T when compared to w-mers from the query.
31 31 BLAST Each neighborhood word gives all positions in the database where it is found (hit list).
32 32 BLAST The program tries to extend matching segments (seeds) out in both directions by adding pairs of residues. Residues will be added until the incremental score drops below a threshold.
33 The Five BLAST Programs Program Database Query Typical uses BLASTN Nucleotide Nucleotide Mapping oligonucleotides, cdnas, and PCR products to a genome; screening repetitive elements; crossspecies sequence exploration; annotating genomic DNA; clustering sequencing reads; vector clipping. BLASTP Protein Protein Identifying common regions between proteins; collecting related proteins for phylogenetic analyses. BLASTX Protein Nucleotide translated into protein Finding protein-coding genes in genomic DNA; determining if a cdna corresponds to a known protein. TBLASTN Nucleotide translated into protein Protein Identifying transcripts, potentially from multiple organisms, similar to a given protein; mapping a protein to genomic DNA. TBLASTX Nucleotide translated into protein Nucleotide translated into protein Cross-species gene prediction at the genome or transcript level; searching for genes missed by traditional methods or not yet in protein databases. 33
34 Distribution of Pairwise Local Alignment Scores (Karlin-Altschul Statistics) - Assumptions There is at least one positive score The expected score must be negative The letters of the sequences are iid. The sequences are infinitely long. Alignment doesn t contain gaps. 34
35 Statistical Distribution Pairwise Local Alignment Scores Given 2 random protein sequences, the number of distinct, or "locally optimal" MSPs with scores at least S, expected to occur simply by chance is where N K = = product of seq lengths explicit calculable parameter i, j KNe λs S λ = unique positive solution to ppe λ ij i j = 1 This is the E value reported by BLAST. 35
36 36 Statistical Significance of Pairwise Alignment Analysis of Headruns Two sequences A, A,..., A and B,..., B with same length The letters are i.i.d. 1 2 n 1 Consider fixed alignment, that is, no shifts Consider exact matching, that is no mismatch or indels n
37 37 Alignment Scores The highest local alignment score H( A, B) = max{ s( I, J): I A, J B} indicates the best matching region along sequences A and B ( A, B) R = max{ for k = 1 to m, Flip a coin n times with H n m : A i + k = B i + k 0 i n m} p = Pr( A i = Bi ) for heads each time, the longest run of heads corresponds to R n
38 38 Analysis of Headruns - Waterman (1995) m Heuristics: a headrun of length m has a probability p there are about n possible headruns so m E(# headrtuns of length m) np If the largest run is unique, its length R n should satisfy = np R n 1, which has a solution R = log n n 1 Theorem 1 Let A1, A2,..., B1, B2,... be independent and identically distributed with < p Pr( A = B ) < p Then Pr lim n R log 1 n p n = 1 = 1
39 Chen-Stein Method of Poisson Approximation 39
40 Analysis of Headruns 40
41 41
42 42 Distribution of Pairwise Local Alignment Scores Karlin and Altschul (1990) Of course, BLAST score theory is much more complicated Unequal sequence lengths Shifting Score matrix ungapped case Karlin-Altschul Local Alignment Scores distribution Pr( S > x) 1 exp( kmne λx )
43 43 E value and p-value E value is The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. p-value - the probability of an alignment occurring with the score in question or better.
44 E-value or p-value It is more appropriate to rank the importance of an alignment score by the p-values since matches with long sequences can yield larger scores simply due to sequence length. The BLAST programs report E-value rather than P-values because it is easier to understand the difference between, for example, E-value of 5 and 10 than P-values of and When E < 0.01, P-values and E-value are nearly identical. Use p-value for evaluation of cases on the boundary of statistical significance. 44
45 45 References BLAST Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ Basic local alignment search tool. J Mol Biol 215: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Research, 25(17) Substitution Matrices Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. (1978) Altas of Protein Sequence and Structure. National Biomedical Research Foundation. Washington, DC, 5, Henikoff, S., Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proceedings of National Academy of Sciences, 89, Altschul, S.F. (1991) Amino acid substitution matrices from an information theoretical perspective. Journal of Molecular Biology 219,
46 46 Reference Distributions of Pairwise Sequence Algnment Scores Karlin, S., Altschul, S.F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of National Academy of Sciences 87, Arratia, R., Goldstein, L., Gordon, L. (1989) Two moments suffice for Poisson approximation: The Chen-Stein method. Annals of Probability, 17, 9-25 Waterman, M.S., Vingron, M. (1994) Sequence comparison significance and poisson approximation. Statistical Science 9, Siegmund Dl, Yakir, B. (2000) Approximate p-values for local sequence alignments. Annals of Statistics, 28, 3,
47 Thank You! 47
The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.
Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationOutline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases
Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationBLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.
BLAST Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. An example could be aligning an mrna sequence to genomic DNA. Proteins are frequently composed of
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationDatabase Searching and BLAST Dannie Durand
Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is
More informationBLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationEvolutionary Genetics. LV Lecture with exercises 6KP
Evolutionary Genetics LV 25600-01 Lecture with exercises 6KP HS2017 >What_is_it? AATGATACGGCGACCACCGAGATCTACACNNNTC GTCGGCAGCGTC 2 NCBI MegaBlast search (09/14) 3 NCBI MegaBlast search (09/14) 4 Submitted
More informationCreation of a PAM matrix
Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental
More informationComparative Bioinformatics. BSCI348S Fall 2003 Midterm 1
BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to
More informationTypically, to be biologically related means to share a common ancestor. In biology, we call this homologous
Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationModern BLAST Programs
Modern BLAST Programs Jian Ma and Louxin Zhang Abstract The Basic Local Alignment Search Tool (BLAST) is arguably the most widely used program in bioinformatics. By sacrificing sensitivity for speed, it
More informationBioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA)
Vol. 6(1), pp. 1-6, April 2014 DOI: 10.5897/IJBC2013.0086 Article Number: 093849744377 ISSN 2141-2464 Copyright 2014 Author(s) retain the copyright of this article http://www.academicjournals.org/jbsa
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 2. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 2 Sepp Hochreiter gene Central Dogma nucleus DNA 1. transcription (mrna) 2. transport mrna protein 3. translation (ribosom, trna) 4. folding (protein)
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationComputational Molecular Biology. Lecture Notes. by A.P. Gultyaev
Computational Molecular Biology Lecture Notes by A.P. Gultyaev Leiden Institute of Applied Computer Science (LIACS) Leiden University January 2017 1 Contents Introduction... 3 1. Sequence databases...
More informationApplication for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick
Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan
More informationBLAST Basics. ... Elements of Bioinformatics Spring, Tom Carter. tom/
BLAST Basics...... Elements of Bioinformatics Spring, 2003 Tom Carter http://astarte.csustan.edu/ tom/ March, 2003 1 Sequence Comparison One of the fundamental tasks we would like to do in bioinformatics
More informationLast Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationAlignment to a database. November 3, 2016
Alignment to a database November 3, 2016 How do you create a database? 1982 GenBank (at LANL, 2000 sequences) 1988 A way to search GenBank (FASTA) Genome Project 1982 GenBank (at LANL, 2000 sequences)
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationGapped BLAST and PSI-BLAST: a new generation of protein database search programs
1997 Oxford University Press Nucleic Acids Research, 1997, Vol. 25, No. 17 3389 3402 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Stephen F. Altschul*, Thomas L. Madden,
More informationFiles for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationMotif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana
Motif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim Indiana University, School of Informatics
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationGenome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does
More informationFrom DNA to Protein: Genotype to Phenotype
12 From DNA to Protein: Genotype to Phenotype 12.1 What Is the Evidence that Genes Code for Proteins? The gene-enzyme relationship is one-gene, one-polypeptide relationship. Example: In hemoglobin, each
More informationVL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch
VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Vorlesungsthemen Part 1: Background
More informationOutline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018
Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationAnnotating Fosmid 14p24 of D. Virilis chromosome 4
Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome
More informationLecture 2: Central Dogma of Molecular Biology & Intro to Programming
Lecture 2: Central Dogma of Molecular Biology & Intro to Programming Central Dogma of Molecular Biology Proteins: workhorse molecules of biological systems Proteins are synthesized from the genetic blueprints
More information(a) (3 points) Which of these plants (use number) show e/e pattern? Which show E/E Pattern and which showed heterozygous e/e pattern?
1. (20 points) What are each of the following molecular markers? (Indicate (a) what they stand for; (b) the nature of the molecular polymorphism and (c) Methods of detection (such as gel electrophoresis,
More informationWhy Use BLAST? David Form - August 15,
Wolbachia Workshop 2017 Bioinformatics BLAST Basic Local Alignment Search Tool Finding Model Organisms for Study of Disease Can yeast be used as a model organism to study cystic fibrosis? BLAST Why Use
More informationAgenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence
Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase
More informationAn introduction to multiple alignments
An introduction to multiple alignments original version by Cédric Notredame, updated by Laurent Falquet Overview! Multiple alignments! How-to, Goal, problems, use! Patterns! PROSITE database, syntax, use!
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationLecture for Wednesday. Dr. Prince BIOL 1408
Lecture for Wednesday Dr. Prince BIOL 1408 THE FLOW OF GENETIC INFORMATION FROM DNA TO RNA TO PROTEIN Copyright 2009 Pearson Education, Inc. Genes are expressed as proteins A gene is a segment of DNA that
More informationTheory and Application of Multiple Sequence Alignments
Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com
More informationCh 10 Molecular Biology of the Gene
Ch 10 Molecular Biology of the Gene For Next Week Lab -Hand in questions from 4 and 5 by TUES in my mailbox (Biology Office) -Do questions for Lab 6 for next week -Lab practical next week Lecture Read
More informationBundle 6 Test Review
Bundle 6 Test Review DNA vs. RNA DNA Replication Gene Mutations- Protein Synthesis 1. Label the different components and complete the complimentary base pairing. What is this molecule called? Deoxyribonucleic
More informationAb Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*
COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank
More informationBIOINFORMATICS Introduction
BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea
More informationLecture 2: Biology Basics Continued
Lecture 2: Biology Basics Continued Central Dogma DNA: The Code of Life The structure and the four genomic letters code for all living organisms Adenine, Guanine, Thymine, and Cytosine which pair A-T and
More informationMolecular Biology Primer. CptS 580, Computational Genomics, Spring 09
Molecular Biology Primer pts 580, omputational enomics, Spring 09 Starting 19 th century What do we know of cellular biology? ell as a fundamental building block 1850s+: ``DNA was discovered by Friedrich
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not
More informationMOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1
AP BIOLOGY MOLECULAR GENETICS ACTIVITY #2 NAME DATE HOUR PROTEIN SYNTHESIS Molecular Genetics Activity #2 page 1 GENETIC CODE PROTEIN SYNTHESIS OVERVIEW Molecular Genetics Activity #2 page 2 PROTEIN SYNTHESIS
More informationIntroduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute
Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics Institute A brief outline of this course What is gene expression, why it s important Microarrays and how
More informationThe Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot
The Genetic Code and Transcription Chapter 12 Honors Genetics Ms. Susan Chabot TRANSCRIPTION Copy SAME language DNA to RNA Nucleic Acid to Nucleic Acid TRANSLATION Copy DIFFERENT language RNA to Amino
More informationFACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE
FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)
More informationThe Method Description of Target Gene Prediction
The Method Description of Target Gene Prediction There are two main algorithms to predict target genes. They re described as follows: 1. The descriptions and computing processes: MiRNA can combine with
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationChapter 8 From DNA to Proteins. Chapter 8 From DNA to Proteins
KEY CONCEPT Section 1 DNA was identified as the genetic material through a series of experiments. Griffith finds a transforming principle. Griffith experimented with the bacteria that cause pneumonia.
More informationAnnotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence
Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader
More informationSTUDYING THE SECONDARY STRUCTURE OF ACCESSION NUMBER USING CETD MATRIX
Vol. 4, No.4,. STUDYING THE SECONDARY STRUCTURE OF ACCESSION NUMBER USING CETD MATRIX Anamika Dutta Department of Statistics, Gauhati University, Guwahati-784, Assam, India anamika.dut8@gmail.com Kishore
More informationPRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5
Molecular Biology-2017 1 PRESENTING SEQUENCES As you know, sequences may either be double stranded or single stranded and have a polarity described as 5 and 3. The 5 end always contains a free phosphate
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationExploring the Genetic Basis for Behavior. Instructor s Notes
Exploring the Genetic Basis for Behavior Instructor s Notes Introduction This lab was designed for our 300-level Advanced Genetics course taken by juniors and seniors majoring in Biology or Biochemistry.
More informationBio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?
Bio11 Announcements TODAY Genetics (review) and quiz (CP #4) Structure and function of DNA Extra credit due today Next week in lab: Case study presentations Following week: Lab Quiz 2 Ch 21: DNA Biology
More informationThe Chemistry of Genes
The Chemistry of Genes Adapted from Success in Science: Basic Biology Key Words Codon: Group of three bases on a strand of DNA Gene: Portion of DNA that contains the information needed to make a specific
More informationArticle A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm
Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm Zhongneng Xu * Yayun Yang Beibei Huang, From the Department of Ecology, Jinan University, Guangzhou 510632,
More informationBig picture and history
Big picture and history (and Computational Biology) CS-5700 / BIO-5323 Outline 1 2 3 4 Outline 1 2 3 4 First to be databased were proteins The development of protein- s (Sanger and Tuppy 1951) led to the
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationSection 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?
Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Messenger RNA Carries Information for Protein Synthesis from the DNA to Ribosomes Ribosomes Consist
More informationDNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test
DNA is the genetic material Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test Dr. Amy Rogers Bio 139 General Microbiology Hereditary information is carried by DNA Griffith/Avery
More informationAdv Biology: DNA and RNA Study Guide
Adv Biology: DNA and RNA Study Guide Chapter 12 Vocabulary -Notes What experiments led up to the discovery of DNA being the hereditary material? o The discovery that DNA is the genetic code involved many
More informationIntroduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013
Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance
More informationDNA is normally found in pairs, held together by hydrogen bonds between the bases
Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,
More informationALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG
Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press
More informationMolecular Databases and Tools
NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton
More informationGenome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)
Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA
More informationChapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.
Chapter 12 Packet DNA and RNA Name Period California State Standards covered by this chapter: Cell Biology 1. The fundamental life processes of plants and animals depend on a variety of chemical reactions
More informationSequence Analysis Lab Protocol
Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136
More informationBundle 5 Test Review
Bundle 5 Test Review DNA vs. RNA DNA Replication Gene Mutations- Protein Synthesis 1. Label the different components and complete the complimentary base pairing. What is this molecule called? _Nucleic
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationChapter 12. DNA TRANSCRIPTION and TRANSLATION
Chapter 12 DNA TRANSCRIPTION and TRANSLATION 12-3 RNA and Protein Synthesis WARM UP What are proteins? Where do they come from? From DNA to RNA to Protein DNA in our cells carry the instructions for making
More informationMolecular Genetics Quiz #1 SBI4U K T/I A C TOTAL
Name: Molecular Genetics Quiz #1 SBI4U K T/I A C TOTAL Part A: Multiple Choice (15 marks) Circle the letter of choice that best completes the statement or answers the question. One mark for each correct
More informationCH 17 :From Gene to Protein
CH 17 :From Gene to Protein Defining a gene gene gene Defining a gene is problematic because one gene can code for several protein products, some genes code only for RNA, two genes can overlap, and there
More informationExploring Similarities of Conserved Domains/Motifs
Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;
More informationIntroduction to Molecular Biology
Introduction to Molecular Biology Content Cells and organisms Molecules of life (Biomolecules) Central dogma of molecular biology Genes and gene expression @: Most pictures have been freely obtained from:
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationAn introduction to genetics and molecular biology
An introduction to genetics and molecular biology Cavan Reilly September 5, 2017 Table of contents Introduction to biology Some molecular biology Gene expression Mendelian genetics Some more molecular
More informationAdvisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College.
Author: Sulochana Bramhacharya Affiliation: Hiram College, Hiram OH. Address: P.O.B 1257 Hiram, OH 44234 Email: bramhacharyas1@my.hiram.edu ACM number: 8983027 Category: Undergraduate research Advisors:
More informationIndependent Study Guide The Blueprint of Life, from DNA to Protein (Chapter 7)
Independent Study Guide The Blueprint of Life, from DNA to Protein (Chapter 7) I. General Principles (Chapter 7 introduction) a. Morse code distinct series of dots and dashes encode the 26 letters of the
More informationDNA RNA PROTEIN. Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted
DNA RNA PROTEIN Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted DNA Molecule of heredity Contains all the genetic info our cells inherit Determines
More information