Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il TA: Oleg Rokhlenko Lecture 1 Introduction to Bioinformatics
Introduction to Bioinformatics What is Bioinformatics? Why do we need it? Development timeline Journals, books, websites How to access bioinformatics tools? Why is bioinformatics hard? PubMed and OMIM databases 2
Bioinformatics: What? NCBI: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Lincoln Stein: Biologists using computers, or the other way around. Martin Gerstel (Compugen): Bioinformatics is a name which will probably disappear with time. 3
Bioinformatics: Why? Storing large quantity of data Sequencing Crystallography DNA chips Enabling fast retrieval Database searching Data mining and analysis Integrate diverse sources 4
Human Genome Project Initiated in 1988, declared complete 2003 Major goals Determine 3 10 9 base pairs Identify ~30,000 genes Computational tasks Storage and indexing Building contigs Scanning for genes 5
Human Genome Progress Source: EMBL Genome Monitoring Table 6
IBM s Blue Gene Task: in-silico protein folding Announced 1999 Expanded in 2001 500,000 times faster than Pentium IV Aim: Fold one protein per year 7
Bioinformatics: When? Watson and Crick DNA model 1955 Sanger sequences insulin protein N-W sequence alignment 1965 1960 ARPANET (early Internet) PDB (Protein Data Bank) 1975 1970 Sanger dideoxy DNA sequencing GenBank database 1985 1980 PCR (Polymerase Chain Reaction) 8
USA s NCBI FASTA algorithm 1990 SWISS-PROT database Human Genome Initiative BLAST algorithm WWW (World Wide Web) 1995 Israel s INN Europe s EBI Celera Genomics 2000 First human genome draft 9
GenBank Growth Source: NCBI 10
PubMed Growth 14,000,000 12,000,000 Articles in Database 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 1959 1962 1965 1968 1971 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 11
Bioinformatics: Where? Journals 12
Books David W. Mount, Bioinformatics: Sequence and Genome Analysis Cynthia Gibas, Developing Bioinformatics Computer Skills Bryan P. Bergeron, Bioinformatics Computing 13
World Wide Web USA National Center for Biotechnology Information: www.ncbi.nlm.nih.gov European Bioinformatics Institute: www.ebi.ac.uk ExPASy Molecular Biology Server: www.expasy.org Israeli National Node: inn.org.il Open source news: bioinformatics.org German directory: bioinformatik.de 14
Bioinformatics: How? Pre-packaged tools Majority on World Wide Web Some require downloading Most are free to use Beginning development Mostly Unix environment Perl programming language 15
The Trouble with Nature Hard to represent Understanding still incomplete Some problems insoluble? 16
The Trouble with Man Confusing choice of tools Developed independently Written by and for nerds 17
Making it Simpler 18
PubMed MEDLINE publication database Over 17,000 journals Some other citations Papers from 1960s Over 12,000,000 entries Alerting services http://www.pubcrawler.ie/ http://www.biomail.org/ 19
A PubMed Entry Journal reference Volume, number, date, pages Title, authors, affiliation Abstract Cancer 2003 May 1;97(9):2248-53 Links Related articles Full text (sometimes) Database entries Pregnancy and early-stage melanoma. Daryanani D, Plukker JT, De Hullu JA, Kuiper H, Nap RE, Hoekstra HJ. Division of Surgical Oncology, University Medical Center, Groningen, The Netherlands. BACKGROUND: Cutaneous melanomas are aggressive tumors with an unpredictable 20
Searching PubMed Structureless searches Automatic term mapping Structured searches Field names, e.g. [au], [ta], [dp], [ti] Boolean operators, e.g. AND, OR, NOT, () Additional features Subsets, limits Clipboard, history 21
OMIM Online Mendelian Inheritance in Man Genes and genetic disorders Edited by team at Johns Hopkins Updated daily Entries 10670 single-loci phenotypes (*) 1294 multi-loci phenotypes (#) 2415 unclassified phenotypes 22
An OMIM Entry Phenotype description Clinical features Diagnosis and treatment Molecular genetics Inheritance Model Mapping history Genetic locus/loci CYSTIC FIBROSIS; CF Alternative titles; symbols MUCOVISCIDOSIS Gene map locus 7q31.2 DESCRIPTION References Manifestations relate not only to the disruption of exocrine function of the pancreas 23
Searching OMIM Search Fields Disease name, e.g. hypertension Cytogenetic location, e.g. 1p31.6 Inheritance, e.g. autosomal dominant Browsing Interfaces Alphabetical by disease Genetic map Additional features like PubMed 24