Bioinformatics Some selected examples... and a bit of an overview Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July 19, 2007 @ EnviroHealth Connections
Bioinformatics and Computational Biology Wikipedia: Bioinformatics and computational biology involve the use of techniques including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, and the modeling of evolution.
Bioinformatics and Computational Biology Wikipedia: The terms bioinformatics and computational biology are often used interchangeably. However bioinformatics more properly refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems inspired from the management and analysis of biological data. Computational biology, on the other hand, refers to hypothesis-driven investigation of a specific biological problem using computers, carried out with experimental or simulated data, with the primary goal of discovery and the advancement of biological knowledge.
Bioinformatics and Computational Biology NIH definition of Bioinformatics and Computational Biology: Bioinformatics and computational biology are rooted in life sciences as well as computer and information sciences and technologies. Both of these interdisciplinary approaches draw from specific disciplines such as mathematics, physics, computer science and engineering, biology, and behavioral science. Bioinformatics applies principles of information sciences and technologies to make the vast, diverse, and complex life sciences data more understandable and useful. Computational biology uses mathematical and computational approaches to address theoretical and experimental questions in biology. Although bioinformatics and computational biology are distinct, there is also significant overlap and activity at their interface.
Bioinformatics and Computational Biology NIH definition of Bioinformatics and Computational Biology: The NIH Biomedical Information Science and Technology Initiative Consortium agreed on the following definitions of bioinformatics and computational biology recognizing that no definition could completely eliminate overlap with other activities or preclude variations in interpretation by different individuals and organizations. Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
The central dogma of biology Drawn by Ebbe Sloth Andersen http://old.mb.au.dk/
Topics DNA RNA Protein DNA Sequence analysis, genome annotation, evolutionary biology, phylogeny, DNA alterations, comparative genomics, SNP association studies RNA Analysis of gene expression and regulation Proteins Analysis of protein expression, protein-protein docking, prediction of protein structure Systems Biology: modeling biological systems, gene/protein networks
Some selected examples 1 Chromosomal alterations 2 Protein structure prediction 3 2D gel electrophoresis
Some selected examples 1 Chromosomal alterations 2 Protein structure prediction 3 2D gel electrophoresis
Karyotypes
Trisomy http://www.medgen.ubc.ca
DNA changes
The data
Deletion
FISH
Amplification
Uniparental Isodisomy
Cancer samples
Mosaicism
SNPchip S4 classes and methods
Estimation 1 By SNP: Estimate genotype and copy number for each SNP. 2 Within a sample: Borrow strength between SNPs to infer regions of LOH and copy number changes. 3 Between samples: Comparison between normal and disease populations to find chromosomal alterations associated with disease.
Vanilla ICE 5 4 3 Deletion Normal LOH Amplification A D B C E 2 1 Van ICE A D B E 3 2 1 Van ICE 50 51 52 53 54 55 69 70 71 72 73 74 174 176 178 Mb 238 240 242 244 246
A HapMap sample 5 4 3 2 Deletion Normal LOH Amplification 1 Van ICE 100 150 190 200 5 4 3 2 1 Van ICE 135 140 143 145 150 Mb 149 149.5 150 150.5 151 151.5 152
Many HapMap samples
SNP Trio
HMM for SNP Trio chromosome 10 chromosome 22 BPI BPI UPI F UPI F UPI M UPI M MI D MI D MI S non BPI MI S non BPI BPI BPI 0 20 40 60 80 100 120 140 position (Mb) 15 20 25 30 35 40 45 50 position (Mb)
HMM for SNP Trio 0.9 1 8 BPI UPI P UPI M MI D MI S HMM 0.9 1 8 BPI UPI P UPI M MI D MI S HMM copy number 1 0.5 0 0.5 1 1.5 2 child copy number 1 0.5 0 0.5 1 1.5 2 child mother mother copy number 1 0.5 0 0.5 1 1.5 2 copy number 1 0.5 0 0.5 1 1.5 2 copy number 1 0.5 0 0.5 1 1.5 2 father copy number 1 0.5 0 0.5 1 1.5 2 father 0 20 40 60 80 100 120 135 15 20 25 30 35 40 45 49
Some selected examples 1 Chromosomal alterations 2 Protein structure prediction 3 2D gel electrophoresis
Proteins Amino acids are the building blocks of proteins.
Proteins Both figures show the same protein (the bacterial protein L). The right figure also highlights the secondary structure elements.
Proteins From Lehninger, Principles of Biochemistry
Functional Annotation
Genome Wide Annotation
Some selected examples 1 Chromosomal alterations 2 Protein structure prediction 3 2D gel electrophoresis
2D Gel Electrophoresis
2D Gel Electrophoresis 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 0 0 100 100 200 200 300 300 400 400 500 500 600 600 700 700 800 800 900 900 1000 1000 1100 1100 1200 1200 1300 1300 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
2D Gel Electrophoresis A B A:1 A:2 A:3 A:4 A:5 A:6 A:7 A:8 A:9 A:10 A:11 A:12 B:1 B:2 B:3 B:4 B:5 B:6 B:7 B:8 B:9 B:10 B:11 B:12
2D Gel Electrophoresis A B A:1 A:2 A:3 A:4 B:1 B:2 B:3 B:4
2D Gel Electrophoresis % reduction of concentration as compared to background 60 40 20 0 1 st Trimester 3 rd Trimester 20 Folate Placebo
2D Gel Electrophoresis 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 0 0 100 100 200 200 300 300 400 400 500 500 600 600 700 700 800 800 900 900 1000 1000 1100 1100 1200 1200 1300 1300 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
2D Gel Electrophoresis 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 0 0 100 248 100 332 200 200 300 586 300 400 662 400 500 790 797 770 768 500 853 600 600 948 700 1056 1026 700 800 800 900 900 1458 1000 1000 1100 1100 1200 1200 1300 1300 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
http://biostat.jhsph.edu/ iruczins/