Computational Methods in Bioinformatics Ying Xu 2017/12/6 1
What We Intend to Teach A general introduction to the field of bioinformatics what problems people have been and are currently working on how people solve these problems what key computational techniques are used and needed how much help computing has provided to biological research A way of thinking -- tackling biological problem computationally how to look at a biological problem from a computational point of view how to collect statistics from biological data how to build a computational model how to solve a computational modeling problem algorithmically 2017/12/6 2
What We Intend to Teach Some exposure to computational biology and bioinformatics research what are some of the interesting research areas key challenges available tools and resources Basic topics of bioinformatics, covering computational genomics computational proteomics structural bioinformatics computational systems biology 2017/12/6 3
Detailed Topics Introduction to bioinformatics Biosequence alignment Computational gene finding Protein structure prediction transcription regulation motif finding gene-expression data analysis protein function prediction comparative genomics, including operon prediction pathways and network prediction 2017/12/6 4
Prerequisites A little knowledge in computer programming Basic knowledge of molecular biology Strong interests in learning bioinformatics and computational biology! 2017/12/6 5
Reference book References Current topics in computational molecular biology, T Jiang, Y Xu and MQ Zhang, MIT Press, 2002 Required reading Primer on Molecular Genetics, US Department of Energy Handouts by instructors http://csbl.bmb.uga.edu/mirrors/jlu/bioinformaticscourse/ 2017/12/6 6
Workload Regular class attendance One term project To analyze a bacterial genome to predict and summarize Protein encoding genes Functions Operons Structures 2017/12/6 7
A Brief Introduction to Bioinformatics and Computational Biology 2017/12/6 8
The Basics ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatc gtgtgggtagtagctgatatgatgcgaggtaggggataggatagc aacagatgagcggatgctgagtgcagtggcatgcgatgtcgatga tagcggtaggtagacttcgcgcataaagctgcgcgagatgattgc aaagragttagatgagctgatgctagaggtcagtgactgatgatcg atgcatgcatggatgatgcagctgatcgatgtagatgcaataagtc gatgatcgatgatgatgctagatgatagctagatgtgatcgatggta ggtaggatggtaggtaaattgatagatgctagatcgtaggta cell chromosome genome: DNA sequence genes protein metabolic pathway/network 2017/12/6 9
Information storage Working molecules 2017/12/6 10
Bioinformatics (or computational biology) ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatc gtgtgggtagtagctgatatgatgcgaggtaggggataggatagc aacagatgagcggatgctgagtgcagtggcatgcgatgtcgatga tagcggtaggtagacttcgcgcataaagctgcgcgagatgattgc aaagragttagatgagctgatgctagaggtcagtgactgatgatcg atgcatgcatggatgatgcagctgatcgatgtagatgcaataagtc gatgatcgatgatgatgctagatgatagctagatgtgatcgatggta ggtaggatggtaggtaaattgatagatgctagatcgtaggta This interdisciplinary science is about providing computational support to studies on linking the behavior of cells, organisms and populations to the information encoded in the genomes Temple Smith 2017/12/6 11
Bioinformatics It is about developing and using computational techniques to interpret biological data predict structures and functions of biological entities model the dynamic behavior of biological processes and systems People have used mathematical or computing techniques to solve biological problems since early 1900 s e.g., evolution and genetic analyses by R.A. Fisher, J.B.S. Haldane, S. Wright So what is new? 2017/12/6 12
A Historical Perspective Realization of the existence of genes in our cells by Hermann Müller, a student of Morgan (1921) Understanding of the physical natures of genes by F. Sanger (e.g., 1949), E. Chargaff (e.g., 1950), J. Kendrew (e.g., 1958) in 40 and 50 s 2017/12/6 13
A Historical Perspective Understanding of the double helical structure of DNA by James Watson and Frances Crick in 1953 Development of sequencing technology, first of proteins and then of genomic DNA, based on the work of F. Sanger on sequencing of insulin (1956), W Gilbert and A. Maxam on sequencing of Lactose operator (1977) which demonstrated that the genetic sequence of a genome, including human s, is sequence-able! 2017/12/6 14
A Historical Perspective Development of a science of analyzing protein and DNA sequences, particularly in protein sequence analyses and evolution by Margaret Dayhoff (60 s) phylogenetic analyses and comparative sequence analyses by W. Fitch and E. Margoliash (1967) and by R. Doolittle (1983) 2017/12/6 15
A Historical Perspective Development of sequence comparison algorithms Needleman and Wunsch (1970) Smith and Waterman (1981) Organization of biological data into databases Protein Data Bank (PDB, 1973) of protein structures GENBANK (1982) of DNA sequences Computational methods for gene finding in genomic sequences Work by Borodovsky, Claverie, Uberbacher from mid-80 s to early 90 s 2017/12/6 16
A Historical Perspective Sequencing of Human and other genomes (1986 2003) Development of high-throughput measurement technologies microarray/rna-seq for functional states of genes two-hybrid systems for protein-protein interactions DNA methylation data for epigenomic information structural genomics for structure determination ccgtacgtacgtagagtgctagtct agtcgtagcgccgtagtcgatcgt gtgggtagtagctgatatgatgcga ggtaggggataggatagcaacag atgagcggatgctgagtgcagtgg 2017/12/6 catgcgatgtgatagctagatgtga 17 tcgatggtaggtaggatggtaggt genes
A Historical Perspective These high-throughput probing technologies and others are being used to generate enormous amounts of data about the existence, the structure, the functional state, the relationship of biological molecules and machineries.. but what are these data telling us? 2017/12/6 18
A DNA Sequence ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtgg gtagtagctgatatgatgcgaggtaggggataggatagcaacagatgagc ggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtagacttc gcgcataaagctgcgcgagatgattgcaaagragttagatgagctgatgcta gaggtcagtgactgatgatcgatgcatgcatggatgatgcagctgatcgatgt agatgcaataagtcgatgatcgatgatgatgctagatgatagctagatgtgat cgatggtaggtaggatggtaggtaaattgatagatgctagatcgtaggtagta gctagatgcagggataaacacacggaggcgagtgatcggtaccgggctg aggtgttagctaatgatgagtacgtatgaggcaggatgagtgacccgatgag gctagatgcgatggatggatcgatgatcgatgcatggtgatgcgatgctagat gatgtgtgtcagtaagtaagcgatgcggctgctgagagcgtaggcccgaga ggagagatgtaggaggaaggtttgatggtagttgtagatgattgtgtagttgta gctgatagtgatgatcgtag 2017/12/6 19
A Historical Perspective So what is new? It is the amount, the type of and relationships among the biological data about the cellular states, molecular structures and functions, generated by highthroughput technologies, that have driven the rapid advancement of bioinformatics! 2017/12/6 20
An Example of Computation for Biology Lactococcus is a premier model microorganism for a wide array of studies in molecular biology, and is nonpathogenic Streptococcus is closely related to Lactococcus and could become pathogenic upon triggers Question: What make one pathogenic and the other nonpathogenic? specific genes? unique pathways? different regulatory mechanisms? 2017/12/6 21
An Example of Computation for Biology X years ago,. to search for potential genes that possibly make the difference, researchers had to remove various parts of DNA sequence, then observe if they may have any relevance x x x x acggtcgtacgtacgtgttagccgataatccagtgtgagatacacatcatcgaaacacatgaggcgtgcgatagatgatcc...???? This could be a very lengthy process 2017/12/6 22
An Example of Computation for Biology Since the Human genome project (1986), computational scientists have developed computer programs to locate genes in genomic DNA sequence GRAIL, Gene-Scan, Glimmer,. acggtcgtacgtacgtgttagccgataatccagtgtgagatacacatcatcgaaacacatgaggcgtgcgatagatgatcc... genes With gene-prediction programs, researchers only need to knock-out regions predicted to be genes in their search for relevant genes 2017/12/6 23
Example of Computation for Biology Over the years, many genes have been thoroughly studied in different organisms, e.g., human, mouse, fly,., rice, their biological functions have been identified and documented Computational scientists have developed computer programs to associate newly identified genes to genes with known functions! existing methods can associate > 60-80% of newly identified genes to genes with known functions Now, researchers only need to knock-out genes with possibly relevant functions in their search for understanding of a
Example of Computation for Biology Computational programs have been springing out that can predict if two proteins interact with each other if a group of gene products work in the same pathway which regulators regulate a particular pathway functions of genes at a genome scale se capabilities allow researchers to study complex biology problems like erstanding the difference between Lactococcus and Streptococcus in a
Example of Computation for Biology comparative genome analysis entify unique genes in each genome entify unique metabolic pathways and regulatory networks
Examples of Computation for Biology omputer-assisted drug design 3D structure models of G protein-coupled receptors were used to computationally screen 100,000+ compounds as possible drug targets and 100 were selected Follow-up experiments confirmed a high hit rate of 12%-21%
Examples of Computation for Biology omputational analysis of Plasmodium falciparum metabolism Plasmodium causes human malaria computational prediction of metabolic pathways of plasmodium computational simulations have helped to identify 216 chokepoints in this pathway model among all 24 previously suggested drug targets, 21 target at the chokepoints among the three popular drugs for malaria, they all targeted at the chokepoints
Computation for Biology Though computation may not solve a biological problem directly, it can help quickly narrow down the search space rching a needle in a haystack
Change of Paradigm he human genome sequencing project has led to fundamental hanges in how biological science is done! It represents biology s first foray into big science Science, editorial, 2003 he coordinated efforts in high-throughput production of biological ata beyond sequences have fueled the rapid transition of biology om cottage industry science to big science functional genomic data structural genomic data epigenomic data proteomic data metabolonic data
Change of Paradigm h-throughput ta production data mining and interpretation models or hypotheses domain expertise rational experimental design specific data generation
Change of Paradigm Howdy, want to do biology together? omic data
amples of Integrated Computational and Experimental Biology ntification of disordered regions in proteins perimental data suggest that some portions of some protein uctures do not have rigid structures mputational studies suggested that disorder is a common enomena, particularly in signaling proteins, validated experimentally mputational prediction ggests that protein-protein eraction interfaces are often ordered
DH What Could Potentially be Done glucose CO 2 PEP 2NAD 2NADH lactate Biosynthesis in E coli pyruvate NAD NADH NAD NADH NAD NADH formate CO 2 acetyl CoA oxaloacetate ADP ATP 2NADH 2NAD acetate acetate malate citrate ethanol ethanol fumarate isocitrate
Biology for Computation hat computational science is to molecular iology is like what mathematics has been to hysics...
Look into the Future ike physics, where general rules and laws are taught at e start, biology will surely be presented to future enerations of students as a set of basic systems... uplicated and adapted to a very wide range of cellular nd organismic functions, following basic evolutionary rinciples constrained by Earth s geological history. Temple Smith, Current Topics in Computational Molecular Biology
Take-Home Message he driving force of bioinformatics is biological data roduction through high-throughput technologies omputation is becoming increasingly indispensable in iological research ombination of high-throughput data generation and omputation allows scientists to look at more complex iological problems at systems level