Bioinformatics. Dick de Ridder. Tuinbouw Digitaal, 12/11/15

Size: px
Start display at page:

Download "Bioinformatics. Dick de Ridder. Tuinbouw Digitaal, 12/11/15"

Transcription

1 Bioinformatics Dick de Ridder Tuinbouw Digitaal, 12/11/15

2 Bioinformatics is not

3 Bioinformatics is also not

4 Bioinformatics

5 Bioinformatics (2)

6 Bioinformatics (3) US National Institutes of Health (NIH): Bioinformatics: research, development, or application of tools and approaches for expanding the use of biological, medical, behavorial or health data, including these to acquire, store, organize, archive, analyze or visualize such data. Computational biology: the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavorial, and social systems. Systems biology: a scientific approach that combines the principles of engineering, mathematics, physics, and computer science with extensive experimental data to develop a quantitative as well as a deep conceptual understanding of biological phenomena, permitting prediction and accurate simulation of complex (emergent) biological behaviors.

7 Data science Data engineering Scientific method Domain expertise Hacker mindset Data science Mathematics Statistics Visualisation Advanced computing

8 Bioinformatics: biological data science Data engineering Scientific method Biology Mathematics Bioinformatics Hacker mindset Statistics Visualisation Advanced computing

9 Wageningen Bioinformatics Group We develop and apply novel computational methods to get from omics data to biological knowledge in the green life sciences

10 Classic biology

11 Modern biology Study life at the molecular level: DNA (genes) RNA proteins metabolites

12 Modern biology (2) Transcription regulation Gene A Gene B Gene C Gene D Gene E mrna A mrna B mrna C mrna D mirna E RNA interference Metabolic control Protein A Protein B Protein C Protein D Enzyme A Enzyme B Complex CD Complex formation Protein interaction Metabolite 1 Metabolite 2 Metabolite 3 Protein activation Protein F Signaling Other cells

13

14 The omics revolution Measure many (or even all) molecules or interactions at once Sequencing DNA, RNA Immunoprecipitation interactions Microarray mrna Crystallography protein structure Mass-spectrometry proteins & metabolites

15 DNA sequencing Sanger Sequencing : first RNA gene and genome read (Bacteriophage MS2) 1977: Sanger sequencing 1985: automated, ABI 3730XL

16 Biological databases First database for protein sequences: Margaret Dayhoff, 1965

17 Biological databases (2) Walter Goad, 1982: GenBank, for DNA sequences

18 Human Genome Project (HGP) 3.3 billion bases approx. $2.7 billion

19 Next-generation sequencing (2005-now) Illumina MiSeq Illumina HiSeq IonTorrent Oxford Nanopore PacBio

20 Next-generation sequencing (2) In 2015: a human genome for ± $1000, in a few days worldwide sequencing capacity: 25 petabases/year

21 How much is a petabyte? 100 Gb = ±20 DVDs per human genome 1 Pb = 10,000 genomes 1 Pb = 200,000 DVDs = 240m = 500 2Tb disks (250 k ) Michael Schatz, 2014

22 Genomics

23 Genomics (2)

24 And those are just the instruction manuals Transcription regulation Gene A Gene B Gene C Gene D Gene E mrna A mrna B mrna C mrna D mirna E RNA interference Metabolic control Protein A Protein B Protein C Protein D Enzyme A Enzyme B Complex CD Complex formation Protein interaction Metabolite 1 Metabolite 2 Metabolite 3 Protein activation Protein F Signaling Other cells

25 There is also data on how it is used Proteomics Transcriptomics Metabolomics Phenomics

26 .. and prior information

27 The omics revolution Marc Vidal & Eileen Furlong, Nature Review Genetics 5(10), 2004

28 The data flood/tsunami/explosion Randall Dennis The Economist myitforum.com

29 The database flood/tsunami/explosion

30 The amount of data is not the problem Drew Sheneman, The Newark Star Ledger

31 making sense of it is Wisdom improve? Understanding why? Knowledge how? Information what? Data Russel Ackoff, Journal of Applied Systems Analysis 16, 1989

32 Bioinformatics research Wisdom Understanding 3 2 Knowledge Information 1 Data

33 1. From data to information: analysis How do we get as much information out of the data as possible? Wisdom Understanding Drew Sheneman, The Newark Star Ledger Know ledge I nform ation Data

34 Deciphering complex plant genomes Aflitos et al., Plant Journal 2015 Introgression browser Optical mapping

35 Pan-genome representations Graph database construction Interactive visualisation Mining genetic variation gene 1x 2x SNP deletion insertion

36 Methods for polyploid crops Finding genetic variants Reconstructing the individual chromosomes (8N) TGAAATGATTGCTTCG TGAAATGATTGGTTCG TGATATGATTGGTTCG TGATATGATTGCTTCG

37 2. From information to knowledge: models How can we optimally integrate measurements and prior knowledge? Wisdom Understanding Know ledge I nform ation Data

38 feature 2 Black-box models Sequence IISVALANALSFHRA AHQSKYPFEQ ASQSSSDWDSVRDQSVQR QVLLLDDLLTENDVELF WHAVRIEALAASVLSQ Produced? No No Yes No Yes Predicting protein production No Yes TERLIRFASQA prior knowledge measurements van den Berg et al., PLoS ONE 2012 feature 1

39 Modeling biological phenomena Santuari et al., in preparation Root development Blight-tomato interaction Protein-protein interaction PLT2 Puranik et al., Plant Cell 2014 Birch et al., Trends Microb. 2014

40 Modeling full cells Karr et al., Cell

41 3. From knowledge to understanding: intervention and (re)design Allosteric regulation Transcription regulation Gene A Gene B Gene C Gene D Gene E mrna A mrna B mrna C mrna D mirna E Protein A Enzyme A Protein B Enzyme B Protein C Complex CD Complex formation How can we use models to predict optimal RNA interference Protein D interventions or redesigns? Protein interaction Metabolite 1 Metabolite 2 Metabolite 3 Protein activation Wisdom Understanding Protein F Signalling Other cells Know ledge I nform ation Data

42 feature 2 Model-based redesign No Yes TERLIRFASQA Q K prior knowledge measurements van den Berg et al., PEDS 2014 feature 1

43 Synthetic biology: from reading to writing Mutalik et al., Nat. Biotech Gibson et al., Science 2010 DNA design Synthetic genomics Metabolic engineering Sliva et al., Genetics 2015

44 Take-home messages Due to the omics revolution, modern biology is turning into a data science Bioinformatics delivers algorithms and models to get the most out of this data For now, it is used mainly in the lab but real-world applications are around the corner

45 Thank you!