7.91 Lecture #1 Introduction to Bioinformatics
|
|
- Garry Gervase Gray
- 5 years ago
- Views:
Transcription
1 7.91 Lecture #1 Introduction to Bioinformatics Focus on Kinases Michael Yaffe & Pairwise Sequence Comparisons ARDFSHGLLENKLLGCDSMRWE.::..:::..:::: :::. GRDYKMALLEQWILGCD-MRWD Reading: This lecture: Mount pp.1-7, 29-35, 45-48, Next lecture: Mount pp. 8-9, 65-89, , +more The Protein Data Bank (PDB - is the single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The Protein Data Bank. Nucleic Acids Research 28 (2): (PDB Advisory Notice on using materials available in the archive:
2 Outline Review of biological fundamentals Genetics example Biological databases/ncbi resources Simplified sequence analysis where to start Simple sequence comparisons Definitions of related sequences Concepts and types of alignments the good, the bad, and the ugly Dot matrix alignments Computational efficiency Recursion and dynamic programming Substitution matrices: PAM, BLOSUM, Gonnet
3 Outline (cont) Gaps Applied dynamic programming: global alignments: Needleman-Wunsch Applied dynamic programming: local alignments Smith-Waterman Basic statistics of sequence alignments
4 Review of biological fundamentals Genetic material gene as a concept DNA as hereditary material DNA RNA Protein Critical cell functions
5 DNA structure A G C T U Bases
6 DNA structure A T G C Base pairing
7 DNA structure Nucleoside Nucleotide
8 DNA structure 5 3
9 DNA structure 5 3
10 DNA structure
11 RNA structure
12 Transcription OH 3 5 DNA OH RNA OH 5 3
13 Protein structure
14 Nonpolar, Hydrophobic R-groups H O H O H O H O H O H 3 N + C C H H O - 3 N + C C H CH O - 3 N + C C H H O - 3 N + 3 N + C C C C 3 CH CH O - H CH O C CH3 CH3 CH 3 CH 3 CH2 Glycine (Gly, G) Alanine (Ala, A) Valine (Val, V) CH 3 Leucine (Leu, L) Isoleucine (IIe, I) C C H O H O H O H O H 3 N + C C H 3 N + H 3 N + C C H 2 N + C C O CH - O - 2 CH2 CH 2 CH2 NH O - H 2 C CH 2 CH 2 O - S CH 3 Phenylalanine (Phe, F) Proline (Pro, P) Methionine (Met, M) Tryptophan (Trp, W) Polar, Hydrophillic R-groups H O H 3 N + C C O - CH2 OH Serine (Ser, S) H H O H O H O H O O H H H 3 N + C C H O - 3 N + C C H O - 3 N + C C 3 N + 3 N + C C C C O O - O CH2 CH - CH - CH2 CH2 2 OH CH C CH 3 SH 2 OH NH 2 O C NH 2 O Threonine (Thr, T) Cysteine (Cys, C) Tyrosine (Tyr, Y) Asparagine (Asn, N) Glutamine (Gln, Q) Electrically charged Acidic H O H 3 N + H C C O O - H C C CH2 3 N + H 3 N + O - CH2 C - O O CH 2 C - O O Glutamic acid (Glu, E) Aspartic acid (Asp, D) Basic H O H O C C H 3 N + C C CH2 O - CH2 O - CH 2 CH 2 CH 2 CH 2 CH 2 NH NH + 3 C NH + 2 Lysine (Lys, K) NH 2 Arginine (Arg, R) Aspartic acid (Asp, D) Arginine (Arg, R) O H C H 3 N + C O - CH2 NH NH + Histidine (His, H)
15 Codon Table (5')...pNpNpN...(3') in mrna Base at 5' End of Codon U Middle Base of Codon C A G Base at 3' End of Codon U phe (UUU) ser tyr cys U phe ser tyr cys C leu ser termination termination A leu ser termination trp G C leu leu leu pro pro pro his his gln arg arg arg U C A leu pro gln arg G A ile thr asn ser U ile thr asn ser C ile thr lys arg A met (and initiation) thr lys arg G G val val ala ala asp asp gly gly U C val ala glu gly A val ala glu gly G
16 DNA structure 3 5 Sense strand Anti-Sense strand 5 Convention: Sense strand Read 5 Æ 3 3 Reverse Complement
17 Genetics experiment: Isolate a yeat mutant that has increased chromosome number phenotype Can rescue the phenotype with a piece of DNA corresponds to a site of mutation in the yeasts DNA genotype CGTTTTTCCTGTAAAGCGCTTAATTTGTTTACCATTCTATAAAAACCTTGAGCTAAGGCCAACTGATGCA ATTGCTCAAGTGAATGCATAAACAAAGCAAGATCATTCTTAGCGCAAAAAAAACTGGGATTTTGAAATAC AACAAAAGAAAGAAGTAAAAAGGGAATGCAACGCAATAGTTTAGTAAATATCAAACTAAACGCTAATTCG CCATCGAAAAAGACCACAACAAGACCAAATACGTCCAGGATCAATAAACCATGGAGAATATCCCATTCGC CGCAGCAAAGAAACCCGAATTCAAAAATACCTTCACCTGTAAGAGAAAAATTGAACAGATTACCTGTAAA CAATAAGAAGTTTTTGGATATGGAAAGCTCCAAAATTCCATCACCTATAAGGAAAGCGACTTCTTCCAAA ATGATACACGAAAATAAGAAGCTACCTAAATTTAAATCCCTATCACTCGATGACTTTGAACTGGGGAAGA AATTAGGAAAGGGTAAATTCGGTAAAGTTTATTGCGTTCGGCACAGGAGTACAGGATATATTTGCGCACT GAAAGTAATGGAGAAGGAAGAAATAATAAAGTATAATTTACAGAAACAATTCAGAAGGGAGGTAGAAATA CAAACATCGCTAAATCATCCGAATCTAACTAAATCATACGGCTATTTTCATGATGAAAAAAGAGTGTACC TGCTAATGGAATACTTAGTCAATGGGGAAATGTATAAACTATTGAGGTTACACGGACCCTTCAACGATAT TTTAGCATCAGATTATATTTATCAAATTGCCAATGCCCTAGATTATATGCATAAAAAGAATATTATTCAT AGAGATATTAAACCTGAAAATATACTAATAGGGTTCAATAATGTCATTAAGTTAACGGACTTCGGATGGA GTATAATAAATCCGCCAGAAAATAGAAGGAAAACTGTCTGTGGGACAATTGACTACCTTTCTCCAGAAAT GGTGGAGTCAAGGGAATATGATCACACTATAGATGCATGGGCTCTTGGCGTCCTGGCGTTTGAACTACTG ACCGGTGCCCCTCCGTTCGAAGAAGAAATGAAAGATACTACATATAAAAGGATAGCAGCACTGGATATCA AAATGCCCAGTAACATTTCTCAGGATGCGCAAGATTTAATACTTAAACTACTAAAATACGACCCCAAAGA TAGAATGCGCCTTGGAGACGTAAAAATGCATCCTTGGATACTAAGAAACAAGCCCTTTTGGGAAAATAAG CGGTTATAGAATTAAAGTATGACAGAATCGTTTGAAGGGCACTATTAATCACTCCCGCACATATCACATA ATACTAAGTATCCATTTCTAATATTTCATTACTCTTTTCGGCATCGTATATTGCGATATTTGATTAAATT TTCTTGTTCATTTTTTCCTCTTTTCTTTCGCCTTTGTCGAAAGAAAAGAGGAAAACAAGCTGAAAATTGC TATGCATTAAAGTAGCAGATTTACTTTGTTGAGTTGGTTCTGATCAATAATAAGAGTAATGAAAGAAAGC AAAAAAATGGCTAAAGATAATTTAACTAATTTGCTCTCTCAATTGAACATTCAATTGTCTCAA
18 The National Center for Biotechnology Information Created as a part of NLM in 1988 Establish public databases Perform research in computational biology Develop software tools for sequence analysis Disseminate biomedical information
19 Molecular Databases Primary Databases Original submissions by experimentalists Database staff organize but don t add additional information Example: GenBank Derivative Databases Human curated compilation and correction of data Example: SWISS-PROT, NCBI RefSeq mrna Computationally Derived Example: UniGene Combinations Example: NCBI Genome Assembly
20 What is GenBank? NCBI s Primary Sequence Database Nucleotide only sequence database Archival in nature GenBank Data Direct submissions individual records (BankIt, Sequin) Batch submissions via (EST, GSS, STS) ftp accounts sequencing centers Data shared three collaborating databases GenBank DNA Database of Japan (DDBJ). European Molecular Biology Laboratory Database (EMBL) at EBI.
21 STSs BACs X Chromosome 163 Mb Relationships of chromosomes to genome sequencing markers. The X chromosome is about 163 Mb in length. In this diagram, there are 16 overlapping BAC clones that span the entire length. In reality, 1,48 BACs were needed to span the X chromosome. Arrows (top) mark STSs scattered throughout the chromosome and on overlapping BACs.
22 GenBank: NCBI s Primary Sequence Database Release 133 December 23 19,88,11 Records 28,57,99,166 Nucleotides 11, + Species full release every two months incremental and cumulative updates daily available only through internet ftp://ftp.ncbi.nih.gov/genbank/ 17.7 Gigabytes of data
23 GenBank Divisions Bulk Sequence Divisions PAT EST STS GSS HTG HTC CON Patent Expressed Sequence Tags Sequence Tagged Sites Genome Survey Sequences High Throughput Genome High Throughput cdna Contig Traditional Divisions BCT INV MAM PHG PLN PRI ROD SYN UNA VRL VRT
24 The International Sequence Database Collaboration NIH Submissions Updates NCBI Entrez GenBank EMBL Submissions Updates NIG CIB getentry DDBJ Submissions Updates SRS EBI EMBL
25 Web Access Text Entrez Sequence BLAST Structure VAST
26 Genetics experiment: Isolate a yeat mutant that has increased chromosome number phenotype Can rescue the phenotype with a piece of DNA corresponds to a site of mutation in the yeasts DNA genotype CGTTTTTCCTGTAAAGCGCTTAATTTGTTTACCATTCTATAAAAACCTTGAGCTAAGGCCAACTGATGCA ATTGCTCAAGTGAATGCATAAACAAAGCAAGATCATTCTTAGCGCAAAAAAAACTGGGATTTTGAAATAC AACAAAAGAAAGAAGTAAAAAGGGAATGCAACGCAATAGTTTAGTAAATATCAAACTAAACGCTAATTCG CCATCGAAAAAGACCACAACAAGACCAAATACGTCCAGGATCAATAAACCATGGAGAATATCCCATTCGC CGCAGCAAAGAAACCCGAATTCAAAAATACCTTCACCTGTAAGAGAAAAATTGAACAGATTACCTGTAAA CAATAAGAAGTTTTTGGATATGGAAAGCTCCAAAATTCCATCACCTATAAGGAAAGCGACTTCTTCCAAA ATGATACACGAAAATAAGAAGCTACCTAAATTTAAATCCCTATCACTCGATGACTTTGAACTGGGGAAGA AATTAGGAAAGGGTAAATTCGGTAAAGTTTATTGCGTTCGGCACAGGAGTACAGGATATATTTGCGCACT GAAAGTAATGGAGAAGGAAGAAATAATAAAGTATAATTTACAGAAACAATTCAGAAGGGAGGTAGAAATA CAAACATCGCTAAATCATCCGAATCTAACTAAATCATACGGCTATTTTCATGATGAAAAAAGAGTGTACC TGCTAATGGAATACTTAGTCAATGGGGAAATGTATAAACTATTGAGGTTACACGGACCCTTCAACGATAT TTTAGCATCAGATTATATTTATCAAATTGCCAATGCCCTAGATTATATGCATAAAAAGAATATTATTCAT AGAGATATTAAACCTGAAAATATACTAATAGGGTTCAATAATGTCATTAAGTTAACGGACTTCGGATGGA GTATAATAAATCCGCCAGAAAATAGAAGGAAAACTGTCTGTGGGACAATTGACTACCTTTCTCCAGAAAT GGTGGAGTCAAGGGAATATGATCACACTATAGATGCATGGGCTCTTGGCGTCCTGGCGTTTGAACTACTG ACCGGTGCCCCTCCGTTCGAAGAAGAAATGAAAGATACTACATATAAAAGGATAGCAGCACTGGATATCA AAATGCCCAGTAACATTTCTCAGGATGCGCAAGATTTAATACTTAAACTACTAAAATACGACCCCAAAGA TAGAATGCGCCTTGGAGACGTAAAAATGCATCCTTGGATACTAAGAAACAAGCCCTTTTGGGAAAATAAG CGGTTATAGAATTAAAGTATGACAGAATCGTTTGAAGGGCACTATTAATCACTCCCGCACATATCACATA ATACTAAGTATCCATTTCTAATATTTCATTACTCTTTTCGGCATCGTATATTGCGATATTTGATTAAATT TTCTTGTTCATTTTTTCCTCTTTTCTTTCGCCTTTGTCGAAAGAAAAGAGGAAAACAAGCTGAAAATTGC TATGCATTAAAGTAGCAGATTTACTTTGTTGAGTTGGTTCTGATCAATAATAAGAGTAATGAAAGAAAGC AAAAAAATGGCTAAAGATAATTTAACTAATTTGCTCTCTCAATTGAACATTCAATTGTCTCAA
27
28
29
30
31
32 Using Entrez An integrated database search and retrieval system
33 Entrez: Database Integration Word weight PubMed abstracts Phylogeny Taxonomy Genomes 3-D Structur e VAST BLAST Nucleotide sequences Protein sequences BLAST
34 The (ever) Expanding Entrez System UniGene PubMed Nucleotide Protein Journals Structure CDD Genome SNP Entrez PopSet 3D Domains UniSTS Taxonomy OMIM ProbeSet Books
35 Entrez Databases PubMed Biomedical literature Books Online textbooks Nucleotide GenBank, EMBL, DDBJ, RefSeq, PDB Protein [GenBank, EMBL, DDBJ], RefSeq, SWISS-PROT, PIR, PRF, PDB Genome Complete genomes Taxonomy Organisms in NCBI sequence databases Structure MMDB: experimental 3D structures Domains CDD: conserved protein domains 3D Domains Compact 3D protein domains in MMDB OMIM Online Mendelian Inheritance in Man SNP Single nucleotide polymorphisms UniSTS Sequence Tagged Site markers ProbeSet Gene expression and microarray datasets PopSet Population study datasets UniGene Gene-based expressed sequence clusters
36 A Traditional GenBank Record Definition =Title ACCESSION VERSION Version Number U7163 U GI:46243 Accession Number GI Number NCBI s Taxonomy
37 FASTA Format FASTA Definition Line >gi gb U SCU7163 > gi number Locus Name Database Identifiers gb GenBank emb EMBL dbj DDBJ sp SWISS-PROT pdb Protein Databank pir PIR prf PRF ref RefSeq Accession number
38 Abstract Syntax Notation: ASN.1 GenPept GenBank ASN.1 FASTA Protein FASTA Nucleotide
39 /************************************************************************ * * asn2ff.c * convert an ASN.1 entry to flat file format, using the FFPrintArray. * *********************************** Toolbox Sources ***************************************/ #include <accentr.h> #include "asn2ff.h" #include "asn2ffp.h" #include "ffprint.h" #include <subutil.h> ftp> open ftp.ncbi.nih.gov #include <objall.h> #include <objcode.h>. #include <lsqfetch.h> #include <explore.h> #ifdef ENABLE_ID1 #include <accid1.h> #endif FILE *fpl; NCBI Toolbox. ftp> cd toolbox ftp> cd ncbi_tools ftp://ftp.ncbi.nlm.gov/toolbox/ncbi_tools Args myargs[] = { {"Filename for asn.1 input","stdin",null,null,true,'a',arg_file_in,.,,null}, {"Input is a Seq-entry","F", NULL,NULL,TRUE,'e',ARG_BOOLEAN,.,,NULL}, {"Input asnfile in binary mode","f",null,null,true,'b',arg_boolean,.,,null}, {"Output Filename","stdout", NULL,NULL,TRUE,'o',ARG_FILE_OUT,.,,NULL}, {"Show Sequence?","T", NULL,NULL,TRUE,'h',ARG_BOOLEAN,.,,NULL},
40 /protein_id="aaa2496.1" /db_xref="gi:46244" GenPept Protein IDS
41
42
43
44
45
46
47
48
49
50
51
52
53 Why align sequences? Functional predictions based on identifying homologues. Assumes: conservation of sequence conservation of function BUT: Function carried out at level of proteins, i.e. 3-D structure Sequence conservation carried out at level of DNA 1-D sequence
54 Implicit Assumption of Evolution in Models of Sequence Homology Assume: Sequence conservation Structure conservation Note that the converse is NOT necessarily true! Structure conservation X Sequence conservation
55 Definitions Homologue: a relatively non-specific term (meaningless?) that conveys the idea that two sequences are somehow related Orthologue: Ortho = (greek) straight.implies direct descent, 1 ancestor Vav human Vav bovine Vav avian Vav elegans -or- human Vav avian Vav bovine Vav dinosaur Vav Ye olde Vav
56 Definitions Paralogue: Para = (greek) along side of.implies some type of gene duplication event Vav1 human Vav2human Vav3human -or-
57 Complex problem Chicken Vav Vav1 human Vav2human Vav3human Orthologue or Paralogue? Need to know evolutionary relationships Can try and get these from alignments as well
58 Alignments- Good Bad and Ugly HgbA-human GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL :+ +::+:::::+ :+++++::+: ::+::+ :: GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL HgbB-human
59 Alignments- Good Bad and Ugly HgbA-human SPURIOUS ALIGNMENT GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKL ::+ ++: + ++:+: ++ :+ :+ : +:+ : ++::+ GSGYLVGDSLTFVDLL--VAQHTADLLAANAALLDEFPQFKAHQE Nematode glutathione S-transferase HgbA-human GSAQVKGHGKKVADALTNAVAHV---D--DMPNALSALSDLHAHKL :+ ::+ ++ +:++ + +: :+ +:+ : NNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG Leghemoglobin, yellow lupin
60 Alignments Types: Local Global Ungapped Gapped (2 types- linear, affine) Methods: Dot matrix Dynamic Programming Word, k-tup
61 Simplest alignment representation Dot plot Model: Need a metric of similarity between amino acid pairs Simplest metric unitary matrix, identity matrix A A 1 C D E F G H C 1 D 1 E 1 F 1 G 1 H 1 I 1 K I K 1
62 Example self alignment G G 1 F D S F K F D 1 S 1 1 F 1 1 K 1 R 1 L 1 E 1 1 F 1 S 1 R L E F S E V
63 Example self alignment..with sliding window G G 1 F D S F K F D 1 S 1 1 F 1 1 K 1 R 1 L 1 E 1 1 F 1 S 1 R L E F S E V
64 Example self alignment G G 1 F D S F K F 1 1 D 1 S 1 F 1 K 1 R 1 L 1 E 1 F 1 S 1 R L E F S E V
65 Simple alignments Self alignment Dot Matrix DNA symmetric
66 Now align two different sequences * Consider other similarity matrices besides identity. Chemical similarity binary decision Amino acid conservation in aligned protein families min. similarity score (+/- window) Average of multiple scoring systems
67 Dot Matrix Alignments Sequence#1 Sequence#1 1 n 1 m Note NOT symmetric A Local Alignment
68 Dot Matrix Alignments Sequence#1 Sequence#1 1 1 n Insertion Insertion m A Global Alignment
69 General Rules for Dot Matrices Advantage let s your eyes/brain do the work VERY EFFICIENT!!!! DNA comparisons long windows and high stringencies (11/7, 15/11) Proteins short windows and stringencies (1/1) EXCEPT in looking for domains then longer windows and smaller stringencies (15/5). Note we can use these types of self-aligned matrices for more than just sequence comparisons i.e. distance between side Cα atoms in 3d-structures etc.maybe later in the course!
70 Computational Efficiency Measure efficiency in cpu run time and memory O() = big-oh notation Both scale as size of the problem, measured in number of units, n, in the problem, i.e. run time is f(n). Analyse the asymptotic worst-case running time. Sometimes just do the experiment and measure it. If problem scales as square of the number of units in the problem O(n 2 ) (=order n-squared)
71 Examples O(n k ) is polynomial time as long as K<3..tractable Consider our un-gapped dot matrix Global alignment: 1 m 1 n essentially an O(mn) problem
72 O.K. Examples O(n) better than O(n log(n)), better than O(n 2 ), better than O(n 3 ) Terrible Examples O(k n ) = exponential time.horrible!!!! NP problems- no known polynomial time Solutions = non-deterministic polynomial Problems.
11 questions for a total of 120 points
Your Name: BYS 201, Final Exam, May 3, 2010 11 questions for a total of 120 points 1. 25 points Take a close look at these tables of amino acids. Some of them are hydrophilic, some hydrophobic, some positive
More informationBasic concepts of molecular biology
Basic concepts of molecular biology Gabriella Trucco Email: gabriella.trucco@unimi.it Life The main actors in the chemistry of life are molecules called proteins nucleic acids Proteins: many different
More informationA Field Guide to GenBank and NCBI Molecular Biology Resources
A Field Guide to GenBank and NCBI Molecular Biology Resources slightly modified from Peter Cooper ftp://ftp.ncbi.nih.gov/pub/cooper/fieldguide/ Eric Sayers ftp://ftp.ncbi.nih.gov/pub/sayers/field_guide/u_penn/
More informationBi Lecture 3 Loss-of-function (Ch. 4A) Monday, April 8, 13
Bi190-2013 Lecture 3 Loss-of-function (Ch. 4A) Infer Gene activity from type of allele Loss-of-Function alleles are Gold Standard If organism deficient in gene A fails to accomplish process B, then gene
More informationAlgorithms in Bioinformatics ONE Transcription Translation
Algorithms in Bioinformatics ONE Transcription Translation Sami Khuri Department of Computer Science San José State University sami.khuri@sjsu.edu Biology Review DNA RNA Proteins Central Dogma Transcription
More informationDNA.notebook March 08, DNA Overview
DNA Overview Deoxyribonucleic Acid, or DNA, must be able to do 2 things: 1) give instructions for building and maintaining cells. 2) be copied each time a cell divides. DNA is made of subunits called nucleotides
More informationLecture 2 Introduction to Data Formats
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 2 Introduction to Data Formats Introduction to Data Formats Real world, data and formats Sequences and
More informationBasic concepts of molecular biology
Basic concepts of molecular biology Gabriella Trucco Email: gabriella.trucco@unimi.it What is life made of? 1665: Robert Hooke discovered that organisms are composed of individual compartments called cells
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools
CAP 5510: Introduction to Bioinformatics : Bioinformatics Tools ECS 254A / EC 2474; Phone x3748; Email: giri@cis.fiu.edu My Homepage: http://www.cs.fiu.edu/~giri http://www.cs.fiu.edu/~giri/teach/bioinfs15.html
More informationEE550 Computational Biology
EE550 Computational Biology Week 1 Course Notes Instructor: Bilge Karaçalı, PhD Syllabus Schedule : Thursday 13:30, 14:30, 15:30 Text : Paul G. Higgs, Teresa K. Attwood, Bioinformatics and Molecular Evolution,
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationAPPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources
Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with
More informationBioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012
Bioinformatics ONE Introduction to Biology Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012 Biology Review DNA RNA Proteins Central Dogma Transcription Translation
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationMaterials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).
Protein Synthesis Instructions The purpose of today s lab is to: Understand how a cell manufactures proteins from amino acids, using information stored in the genetic code. Assemble models of four very
More informationProblem: The GC base pairs are more stable than AT base pairs. Why? 5. Triple-stranded DNA was first observed in 1957. Scientists later discovered that the formation of triplestranded DNA involves a type
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More information6-Foot Mini Toober Activity
Big Idea The interaction between the substrate and enzyme is highly specific. Even a slight change in shape of either the substrate or the enzyme may alter the efficient and selective ability of the enzyme
More informationIntroduction to Bioinformatics. What are the goals of the course? Who is taking this course? Textbook. Web sites. Literature references
Introduction to Bioinformatics Who is taking this course? People with very diverse backgrounds in biology Some people with backgrounds in computer science and biostatistics Most people (will) have a favorite
More informationENZYMES AND METABOLIC PATHWAYS
ENZYMES AND METABOLIC PATHWAYS This document is licensed under the Attribution-NonCommercial-ShareAlike 2.5 Italy license, available at http://creativecommons.org/licenses/by-nc-sa/2.5/it/ 1. Enzymes build
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
1 CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, PhD Computer Science, Kennesaw State University 2 Genetics The discovery of
More informationBIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks
Rationale of Genetic Studies Some goals of genetic studies include: to identify the genetic causes of phenotypic variation develop genetic tests o benefits to individuals and to society are still uncertain
More informationGranby Transcription and Translation Services plc
ompany Resources ranby Transcription and Translation Services plc has invested heavily in the Protein Synthesis business. mongst the resources available to new recruits are: the latest cellphones which
More informationNucleic acid and protein Flow of genetic information
Nucleic acid and protein Flow of genetic information References: Glick, BR and JJ Pasternak, 2003, Molecular Biotechnology: Principles and Applications of Recombinant DNA, ASM Press, Washington DC, pages.
More informationProtein Structure Analysis
BINF 731 Protein Structure Analysis http://binf.gmu.edu/vaisman/binf731/ Secondary Structure: Computational Problems Secondary structure characterization Secondary structure assignment Secondary structure
More informationMolecular Biology. Biology Review ONE. Protein Factory. Genotype to Phenotype. From DNA to Protein. DNA à RNA à Protein. June 2016
Molecular Biology ONE Sami Khuri Department of Computer Science San José State University Biology Review DNA RNA Proteins Central Dogma Transcription Translation Genotype to Phenotype Protein Factory DNA
More informationDNA is normally found in pairs, held together by hydrogen bonds between the bases
Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,
More informationG4120: Introduction to Computational Biology
G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Lecture 3 February 13, 2003 Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics
More informationUsing DNA sequence, distinguish species in the same genus from one another.
Species Identification: Penguins 7. It s Not All Black and White! Name: Objective Using DNA sequence, distinguish species in the same genus from one another. Background In this activity, we will observe
More informationProtein Synthesis. Application Based Questions
Protein Synthesis Application Based Questions MRNA Triplet Codons Note: Logic behind the single letter abbreviations can be found at: http://www.biology.arizona.edu/biochemistry/problem_sets/aa/dayhoff.html
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationCECS Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Dr. Awwad Abdoh Radwan Dept Pharm Organic Chemistry, Faculty of Pharmacy, Assiut University. 29/09/2005 Required Texts Image Source: http://www.amazon.com/ Other Bioinformatics Books Image Source: http://www.amazon.com/
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationNORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF BIOTECHNOLOGY Professor Bjørn E. Christensen, Department of Biotechnology
Page 1 NRWEGIAN UNIVERSITY F SCIENCE AND TECNLGY DEPARTMENT F BITECNLGY Professor Bjørn E. Christensen, Department of Biotechnology Contact during the exam: phone: 73593327, 92634016 EXAM TBT4135 BIPLYMERS
More informationWhat is necessary for life?
Life What is necessary for life? Most life familiar to us: Eukaryotes FREE LIVING Or Parasites First appeared ~ 1.5-2 10 9 years ago Requirements: DNA, proteins, lipids, carbohydrates, complex structure,
More informationWhat should you do to reinforce the lecture material?
Bioinformatics and Functional Genomics Course Overview, Introduction of Bioinformatics, Biology Background Biol4230 Thurs, Jan 17, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Goals of today
More informationFundamentals of Protein Structure
Outline Fundamentals of Protein Structure Yu (Julie) Chen and Thomas Funkhouser Princeton University CS597A, Fall 2005 Protein structure Primary Secondary Tertiary Quaternary Forces and factors Levels
More information03-511/711 Computational Genomics and Molecular Biology, Fall
03-511/711 Computational Genomics and Molecular Biology, Fall 2011 1 Problem Set 0 Due Tuesday, September 6th This homework is intended to be a self-administered placement quiz, to help you (and me) determine
More informationIntroduction to Bioinformatics. What are the goals of the course? Who is taking this course? Different user needs, different approaches
Introduction to Bioinformatics Who is taking this course? Monday, November 19, 2012 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707 People with very diverse backgrounds in biology
More informationinfo@rcsb.org www.pdb.org Bioinformatics of Green Fluorescent Protein This bioinformatics tutorial explores the relationship between gene sequence, protein structure, and biological function in the context
More informationUnit 1. DNA and the Genome
Unit 1 DNA and the Genome Gene Expression Key Area 3 Vocabulary 1: Transcription Translation Phenotype RNA (mrna, trna, rrna) Codon Anticodon Ribosome RNA polymerase RNA splicing Introns Extrons Gene Expression
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationGenome Informatics. Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, Kiyoko F. Aoki-Kinoshita
Genome Informatics Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, 2008 Kiyoko F. Aoki-Kinoshita Introduction Genome informatics covers the computer- based modeling and data processing
More informationBME205: Lecture 2 Bio systems. David Bernick
BME205: Lecture 2 Bio systems David Bernick Bioinforma;cs Infer pa>erns from life biological sequences structures molecular pathways. Suggest hypotheses from inferred pa>erns Structure and Func;on Novel
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More information36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-
36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L- 37. The essential fatty acids are A. palmitic acid B. linoleic acid C. linolenic
More informationWhat is necessary for life?
Life What is necessary for life? Most life familiar to us: Eukaryotes FREE LIVING Or Parasites First appeared ~ 1.5-2 10 9 years ago Requirements: DNA, proteins, lipids, carbohydrates, complex structure,
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationKey Concept Translation converts an mrna message into a polypeptide, or protein.
8.5 Translation VOBLRY translation codon stop codon start codon anticodon Key oncept Translation converts an mrn message into a polypeptide, or protein. MIN IDES mino acids are coded by mrn base sequences.
More informationWhat You NEED to Know
What You NEED to Know Major DNA Databases NCBI RefSeq EBI DDBJ Protein Structural Databases PDB SCOP CCDC Major Protein Sequence Databases UniprotKB Swissprot PIR TrEMBL Genpept Other Major Databases MIM
More informationHomework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity
Homework Why cited articles are especially useful. citeulike science citation index When cutting and pasting less is more. Project Your protein: I will mail these out this weekend If you haven t gotten
More informationA Zero-Knowledge Based Introduction to Biology
A Zero-Knowledge Based Introduction to Biology Konstantinos (Gus) Katsiapis 25 Sep 2009 Thanks to Cory McLean and George Asimenos Cells: Building Blocks of Life cell, membrane, cytoplasm, nucleus, mitochondrion
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationBioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases
Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras Lecture - 5a Protein sequence databases In this lecture, we will mainly discuss on Protein Sequence
More informationBLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationLecture 19A. DNA computing
Lecture 19A. DNA computing What exactly is DNA (deoxyribonucleic acid)? DNA is the material that contains codes for the many physical characteristics of every living creature. Your cells use different
More informationZool 3200: Cell Biology Exam 3 3/6/15
Name: Trask Zool 3200: Cell Biology Exam 3 3/6/15 Answer each of the following questions in the space provided; circle the correct answer or answers for each multiple choice question and circle either
More informationHe who asks is a fool for five minutes, but he who does not ask remains a fool forever. Today. Admin Stuff. CSE527 Computational Biology
CSE527 Computational Biology http://www.cs.washington.edu/527 He who asks is a fool for five minutes, but he who does not ask remains a fool forever. Larry Ruzzo Autumn 2006 -- Chinese Proverb UW CSE Computational
More informationFirst&year&tutorial&in&Chemical&Biology&(amino&acids,&peptide&and&proteins)&! 1.&!
First&year&tutorial&in&Chemical&Biology&(amino&acids,&peptide&and&proteins& 1.& a. b. c. d. e. 2.& a. b. c. d. e. f. & UsingtheCahn Ingold Prelogsystem,assignstereochemicaldescriptorstothe threeaminoacidsshownbelow.
More informationStructure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München
Structure formation and association of biomolecules Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München Motivation Many biomolecules are chemically synthesized
More informationAipotu II: Biochemistry
Aipotu II: Biochemistry Introduction: The Biological Phenomenon Under Study In this lab, you will continue to explore the biological mechanisms behind the expression of flower color in a hypothetical plant.
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationName: TOC#. Data and Observations: Figure 1: Amino Acid Positions in the Hemoglobin of Some Vertebrates
Name: TOC#. Comparing Primates Background: In The Descent of Man, the English naturalist Charles Darwin formulated the hypothesis that human beings and other primates have a common ancestor. A hypothesis
More informationCHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33
HPER 1 DN: he Hereditary Molecule SEION D What Does DN Do? hapter 1 Modern enetics for ll Students S 33 D.1 DN odes For Proteins PROEINS DO HE nitty-gritty jobs of every living cell. Proteins are the molecules
More information7.014 Quiz II Handout
7.014 Quiz II Handout Quiz II: Wednesday, March 17 12:05-12:55 54-100 **This will be a closed book exam** Quiz Review Session: Friday, March 12 7:00-9:00 pm room 54-100 Open Tutoring Session: Tuesday,
More informationNCBI Molecular Biology Resources. NCBI Resources
NBI Molecular Biology Resources A Field Guide NBI Resources The NBI Entrez System NBI Sequence Databases Primary data: GenBank Derivative data: RefSeq, Gene Protein Structure and Function Sequence polymorphisms
More informationBiology: The substrate of bioinformatics
Bi01_1 Unit 01: Biology: The substrate of bioinformatics What is Bioinformatics? Bi01_2 handling of information related to living organisms understood on the basis of molecular biology Nature does it.
More informationImportant points from last time
Important points from last time Subst. rates differ site by site Fit a Γ dist. to variation in rates Γ generally has two parameters but in biology we fix one to ensure a mean equal to 1 and the other parameter
More information2018 Protein Modeling Exam Key
2018 Protein Modeling Exam Key Multiple Choice: 1. Which of the following amino acids has a negative charge at ph 7? a. Gln b. Glu c. Ser d. Cys 2. Which of the following is an example of secondary structure?
More informationThe combination of a phosphate, sugar and a base forms a compound called a nucleotide.
History Rosalin Franklin: Female scientist (x-ray crystallographer) who took the picture of DNA James Watson and Francis Crick: Solved the structure of DNA from information obtained by other scientist.
More informationAdditional Case Study: Amino Acids and Evolution
Student Worksheet Additional Case Study: Amino Acids and Evolution Objectives To use biochemical data to determine evolutionary relationships. To test the hypothesis that living things that are morphologically
More informationProteins: Wide range of func2ons. Polypep2des. Amino Acid Monomers
Proteins: Wide range of func2ons Proteins coded in DNA account for more than 50% of the dry mass of most cells Protein func9ons structural support storage transport cellular communica9ons movement defense
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationSteroids. Steroids. Proteins: Wide range of func6ons. lipids characterized by a carbon skeleton consis3ng of four fused rings
Steroids Steroids lipids characterized by a carbon skeleton consis3ng of four fused rings 3 six sided, and 1 five sided Cholesterol important steroid precursor component in animal cell membranes Although
More informationBasic Concepts of Human Genetics
Basic Concepts of Human Genetics The genetic information of an individual is contained in 23 pairs of chromosomes. Every human cell contains the 23 pair of chromosomes. One pair is called sex chromosomes
More informationGenomics and Database Mining (HCS 604.3) April 2005
Genomics and Database Mining (HCS 604.3) April 2005 David M. Francis OARDC 1680 Madison Ave Wooster, OH 44691 e-mail: francis.77@osu.edu Introduction: Computers have changed the way biologists go about
More informationHow life. constructs itself.
How life constructs itself Life constructs itself using few simple rules of information processing. On the one hand, there is a set of rules determining how such basic chemical reactions as transcription,
More informationName. Student ID. Midterm 2, Biology 2020, Kropf 2004
Midterm 2, Biology 2020, Kropf 2004 1 1. RNA vs DNA (5 pts) The table below compares DNA and RNA. Fill in the open boxes, being complete and specific Compare: DNA RNA Pyrimidines C,T C,U Purines 3-D structure
More informationHe who asks is a fool for five minutes, but he who does not ask remains a fool forever. Tonight. Admin Stuff. CSEP590A Computational Biology
CSEP590A Computational Biology http://www.cs.washington.edu/csep590a He who asks is a fool for five minutes, but he who does not ask remains a fool forever. Larry Ruzzo Summer 2006 -- Chinese Proverb UW
More informationTHE GENETIC CODE Figure 1: The genetic code showing the codons and their respective amino acids
THE GENETIC CODE As DNA is a genetic material, it carries genetic information from cell to cell and from generation to generation. There are only four bases in DNA and twenty amino acids in protein, so
More informationLIST OF ACRONYMS & ABBREVIATIONS
LIST OF ACRONYMS & ABBREVIATIONS ALA ARG ASN ATD CRD CYS GLN GLU GLY GPCR HIS hstr ILE LEU LYS MET mglur1 NHDC PDB PHE PRO SER T1R2 T1R3 TMD TRP TYR THR 7-TM VFTM ZnSO 4 Alanine, A Arginine, R Asparagines,
More informationBiochemistry and Cell Biology
Biochemistry and Cell Biology Monomersare simple molecules that can be linked into chains. (Nucleotides, Amino acids, monosaccharides) Polymersare the long chains of monomers. (DNA, RNA, Cellulose, Protein,
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationChemistry 121 Winter 17
Chemistry 121 Winter 17 Introduction to Organic Chemistry and Biochemistry Instructor Dr. Upali Siriwardane (Ph.D. Ohio State) E-mail: upali@latech.edu Office: 311 Carson Taylor Hall ; Phone: 318-257-4941;
More informationAipotu I & II: Genetics & Biochemistry
Aipotu I & II: Genetics & Biochemistry Objectives: To reinforce your understanding of Genetics, Biochemistry, and Molecular Biology To show the connections between these three disciplines To show how these
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationDaily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos
Daily Agenda Warm Up: Review Translation Notes Protein Synthesis Practice Redos 1. What is DNA Replication? 2. Where does DNA Replication take place? 3. Replicate this strand of DNA into complimentary
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationHe who asks is a fool for five minutes, but he who does not ask remains a fool forever. Today. Admin Stuff. CSE527 Computational Biology
CSE527 Computational Biology http://www.cs.washington.edu/527 He who asks is a fool for five minutes, but he who does not ask remains a fool forever. Larry Ruzzo Autumn 2007 -- Chinese Proverb UW CSE Computational
More informationTranslating the Genetic Code. DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences
Translating the Genetic Code DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences An overview of gene expression Figure 13.2 The Idea of A Code 20 amino acids 4 nucleotides How do nucleic acids
More informationIntroduc)on to Databases and Resources Biological Databases and Resources
Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs
More information7.014 Problem Set 4 Answers to this problem set are to be turned in. Problem sets will not be accepted late. Solutions will be posted on the web.
MIT Department of Biology 7.014 Introductory Biology, Spring 2005 Name: Section : 7.014 Problem Set 4 Answers to this problem set are to be turned in. Problem sets will not be accepted late. Solutions
More informationName Section Problem Set 3
Name Section 7.013 Problem Set 3 The completed problem must be turned into the wood box outside 68120 by 4:40 pm, Thursday, March 13. Problem sets will not be accepted late. Question 1 Based upon your
More informationTurning λ Cro into a
Turning λ Cro into a 3 Transcriptional Activator Figure by MIT pencourseware. Fred Bushman and Mark Ptashne Cell (1988) 54:191-197 Presented by atalie Kuldell for 0.90 February 4th, 009 Small patch of
More informationConnect the dots DNA to DISEASE
Teachers Material Connect the dots DNA to DISEASE Developed by: M. Oltmann & A. James. California State Standards Cell Biology 1.d. Students know the central dogma of molecular biology outlines the flow
More information