Introduction to Bioinformatics

Introduction to Bioinformatics IMBB 2017 RAB, Kigali - Rwanda May 02 13, 2017 Joyce Nzioki

Plan for the Week Introduction to Bioinformatics Raw sanger sequence data Introduction to CLC Bio Quality Control De novo assembly Resolving conflicts BLAST and Biological databases DNA Barcoding Nucleotide sequence Analysis MSA and Phylogenetics Sequence depositing

What is Bioinformatics Bioinformatics is an interdisciplinary science that develops and improves on methods of storing, retrieving, organizing and analyzing biological data. This computational techniques are to solve biological problemsand discoverthewealth of biological information hidden in biological data.

Bioinformatics The design, construction and use of software tools to generate, store, annotate and analyse data and information relating to Molecular Biology.

Bioinformatics The design, construction and use of software tools to generate, store, annotate and analyse data and information relating to Molecular Biology. Here we consider the use of bioinformatics tools rather than their design and construction. Here we consider the access, storage and analysis of data and information items rather than the generation and annotation.

Bioinformatics Experiment Analysis Hypothesis DATA Sequence Structure RESULT Function Evolution Pathway Interaction Mutation expression

Major types of Bioinformatics Data Literature and ontologies Genomes Gene expression Protein sequence DNA & RNA sequence Protein structure DNA & RNA structure Protein families, motifs and domains Chemical entities Protein interactions Pathways Systems

Bioinformatics Research areas Include but not limited to Organization, classification, dissemination and analysis of biological andbiomedical data Biological sequence analysis and phylogenetics. Genome organization andevolution Regulation of gene expression andepiginetics Biological pathways and network in healthy and disease states Protein structure prediction fromsequence Modelling and prediction of the biophysical properties of biomolecules for binding prediction and drug design Design of biomolecularstructure andfunction With applications to Biology, Medicine, Agriculture and Industry

Where did bioinformatics come from? Bioinformatics arose as molecular biology begun to be transformed by the emergence of molecular sequence and structural data Recap: The key dogmas of molecular biology DNA sequence determined protein sequence Protein sequence determines protein structure Protein structure determines protein function Regulatory mechanisms (e.g. gene expression) determines the amount of a particular function in space and time Bioinformatics is now essential for the archiving, organization and analysis of data related to these processes

Bioinformatics involves the application of computer algorithms, computer models and computer databases with the broad goal of understanding the action of genes, transcripts, proteins and large collections in this entities The integration of information learned about this three biological processes gives insight Into the biology of organisms

How does it look like on a computer

A cdna sequence (reading frame) >gi 14456711 ref NM_000558.3 Homo sapiens hemoglobin, alpha 1 (HBA1), mrna ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCC GCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCAC CACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG ACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCAC AAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGC CGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC GTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCC GTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC A protein sequence >gi 4504347 ref NP_000549.1 alpha 1 globin [Homo sapiens] MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

How do we actually do Bioinformatics? Prepackage tools and databases vmany online and open source vsome are commercial Tool development vmostly on UNIX environment vknowledge of programming requires(python, Perl, R, C, Java) vmay require specialized or high performance computing resources

History of Bioinformatics

Sequencing DNA sequencing is a process of determining the order of nucleotides within a DNA molecule.

History of DNA sequencing 1976: Maxam Gilbert sequencing 1977: Sanger sequencing (dideoxy chain termination) 1986: Flourescently labelled ddntps 1987: Applied Biosystems (ABI 370) 1988: Capillary gell electrophoresis 1999: Applied Biosystems ABI 3700 DNA Analyzer 2005 > : Next generation sequencing

Next Generation Sequencing Illumina MiniSeq Illumina MiSeq Illumina NextSeq Ion PGM PacBio RS II PacBio Sequel Ion Proton ONT MinION CTLGH Introduction to Bioinformatics, 13-17 Feb 2017, Nairobi Illumina HiSeq Illumina NovaSeq Ion S5 ONT PromethION Intro to NGS Sequencing Technologies ONT SmidgION Bert Overduin 14

Applications of Bioinformatics Microbial genome applications Molecular medicine Personalized medicine Preventive medicine Gene therapy Drug development Antibiotic resistance Evolutionary studies Biotechnology Climate change studies Crop improvement Forensic analysis Insect resistance Improve nutritional quality Development of drought resistant varieties Veterinary science Bioengineering Agriculture biotechnology.

Limitations of Bioinformatics Bioinformatics is a science of inference hence: Quality of bioinformatics predictions depends on the quality of data and sophistication of algorithms. Sequence data may have errors which subsequently leads to errors in downstream analysis. Many exhaustive algorithms cannot be used due to computational limitations. Trade-off between specificity and sensitivity

Why bioinformatics then In most cases biologics /wet lab is needed to validate bioinformatic predictions Bioinformatics can: Reduce data to a small set of testable predictions Assign a degree of confidence to each prediction The biologist will often have to choose the appropriate degree of confidence, depending on: Cost of validating predictions. Benefit expected from the right predictions. Data mining - the process by which testable hypothesis are generated regarding the function or structure of a gene or protein of interest by identifying homologs in better characterized organisms. Bioinformatics as in sillico biology: Allows for exploration of domains that cannot be addressed manually e.g study of past evolutionary events / patterns.

The End Acknowledging Bert Overduin University or Edinburgh and EBI online courses for some slides

Thank you IMBB 2017 RAB, Kigali - Rwanda May 02 13, 2017 Joyce Nzioki j.n.njuguna@cgiar.org