Current questions in science. How can Bioinformatics help to to solve them?

Similar documents
GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

GREG GIBSON SPENCER V. MUSE

Motivation From Protein to Gene

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Gene Expression Technology

Chapter 10 Genetic Engineering: A Revolution in Molecular Biology

Genome annotation & EST

Molecular Evolution. Lectures Papers Lab. Dr. Walter Salzburger. Structure of the course: Structure i

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1

CHAPTER 21 LECTURE SLIDES

Klinisk kemisk diagnostik BIOINFORMATICS

Chapter 6 - Molecular Genetic Techniques

Genetics Lecture 21 Recombinant DNA

Concepts of Genetics, 10e (Klug/Cummings/Spencer/Palladino) Chapter 1 Introduction to Genetics

Chapter 1. from genomics to proteomics Ⅱ

Biology 644: Bioinformatics

Biotechnology and DNA Technology

Computational gene finding

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Chapter 15 Gene Technologies and Human Applications

SGN-6106 Computational Systems Biology I

Genomes: What we know and what we don t know

High-throughput Transcriptome analysis

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Complete draft sequence 2001

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall

AP Biology

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology

Branches of Genetics

Discovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

Machine Learning. HMM applications in computational biology

CS313 Exercise 1 Cover Page Fall 2017

Computational gene finding

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

Answer: Sequence overlap is required to align the sequenced segments relative to each other.

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Research techniques in genetics. Medical genetics, 2017.

From Proteomics to Systems Biology. Integration of omics - information

Lecture #1. Introduction to microarray technology

Discovering gene regulatory control using ChIP-chip and ChIP-seq. An introduction to gene regulatory control, concepts and methodologies

PLNT2530 (2018) Unit 6b Sequence Libraries

Introduction to Bioinformatics

DESIGNER GENES - BIOTECHNOLOGY

Two Mark question and Answers

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Data Mining for Biological Data Analysis

Bioinformatics for Proteomics. Ann Loraine

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

2/5/16. Honeypot Ants. DNA sequencing, Transcriptomics and Genomics. Gene sequence changes? And/or gene expression changes?

Introduction to 'Omics and Bioinformatics

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

Chapter 5. Structural Genomics

Biosc10 schedule reminders

Introduction to Molecular Biology

Introduction to Bioinformatics and Gene Expression Technology

Introduction Genetics in Human Society The Universality of Genetic Principles Model Organisms Organizing the Study of Genetics The Concept of the

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Outline and learning objectives. From Proteomics to Systems Biology. Integration of omics - information

The Biotechnology Toolbox

Manipulating genes and cells (Kap. 10)

Computational Genomics. Ron Shamir & Roded Sharan Fall

CHAPTERS 16 & 17: DNA Technology

CHAPTER 21 GENOMES AND THEIR EVOLUTION

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

Lecture Four. Molecular Approaches I: Nucleic Acids

Array-Ready Oligo Set for the Rat Genome Version 3.0

Computational Genomics

Expressed genes profiling (Microarrays) Overview Of Gene Expression Control Profiling Of Expressed Genes

Genetics and Bioinformatics

Moc/Bio and Nano/Micro Lee and Stowell

Introduction to Genome Biology

Genome Sequence Assembly

21.5 The "Omics" Revolution Has Created a New Era of Biological Research

Biotechnolog y and DNA Technology

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Introduction to Algorithms in Computational Biology Lecture 1

BIOTECHNOLOGY. Biotechnology is the process by which living organisms are used to create new products THE ORGANISMS

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.

After the draft sequence, what next for the Human Genome Mapping Project Resource Centre?

Design. Construction. Characterization

Gene Identification in silico

Transcription:

Current questions in science How can Bioinformatics help to to solve them?

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Protein Protein interactions interactions Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Sequence Sequence comparison comparison Data Data mining mining

Historical overview Classification in biology Carl von Linne (1707-1778) Evolution Charles Darwin (1807-1882) Genetics Gregor Mendel (1822-1884) 1869 1952 1953 1970 1983 Discovery of nuclein Friedrich Miescher (1844-1895) DNA is the genetic material Hershey-Chase Molecular structure of DNA Chargaff, 1962 Nobel Prize James Watson, Francis Crick Recombinant DNA, DNA sequencing 1980 Nobel Prize Walter Gilbert, Frederick Sanger, Paul Berg Amplification of DNA (PCR) Kary Mullis & others, 1993 Nobel Prize

Classical genetics Mutant Phaenotypic Feature Protein Function Gene Biochemical Pathways Enzymes Cell cycle Visual signal response Development of tissues Development of organisms Embryogenesis Immune response Receptor proteins Hormones

The limits The dogma: Gene Protein Specific function is not true for all biological functions. Cellular processes involve many different gene products and their interactions. Cellular processes are complex and multi dimensional. This asks for a completely new kind of research.

Current questions in science Genome Transcriptome Regulome High throughput! Proteome Metabolome

Current questions in science To understand complex biological processes Proteomics in the cell and organism. Biology Research Medicine Disease Diagnostics Biotechnology Pharmacology Drug targeting Synthetic substances

Methanococcus jannaschii

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Protein Protein interactions interactions Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Sequence Sequence comparison comparison Data Data mining mining

Highlights in Genome Projects Organism Year Millions of bases Number of genes Number of genes per million bases Saccharomizes cerevisiae Caenorabditis elegans Drosophila melanogaster Arabidopsis thaliana Human genome (public sequence) Human genome (Celera) 1996 12 5800 483 1998 97 19099 197 2000 116 13601 117 2000 115 25498 221 2001 2693 31780 12 2001 2654 39114 15

Complete genomes Whole-genome Whole-genome sequences sequences for for more more than than 800 800 organisms organisms (bacteria, (bacteria, archaea, archaea, and and eukaryota eukaryotaas as well well as as many many viruses viruses and and organells) organells) are are either either complete complete or or being being determined. determined.

Human Genome Project Goals: Determine the sequence (0.75Gb of of data) Identify all all the genes in in the human DNA Store this information in in databases Develop tools for for data analysis and Address the ethical, legal, and social issues that may arise from the project.

Human Genome Project 30-40.000 genes Current estimate: 100.000 --140.000 functional genes More transcripts due to to alternative ~ one splicing gene three or or proteins recombination More than 95% of of the human genome is isnot coding Mostly DNA with Proteome: unknown functions ~ 250.000 proteins CpG islands (45000 per haploid 1.300 genome) protein families Repeated sequences (sines and lines)

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Protein Protein interactions interactions Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Sequence Sequence comparison comparison Data Data mining mining

High throughput! Genome projects Sequencing a genome Clone large parts into special vectors (BACs, can contain up to 1Mbp) Primerwalking Sequence the BACs from beginning to end Shotgun sequencing Fractionate the BAC insert into small fragments Shotgun sequence these (only the ends) computer-assemble all pieces 10-fold excess essential

Genome projects Physical maps of genomes by mapping of known diseases to certain areas or by placing more abstract landmarks on the map such as: PCR fragments (STS sequence tagged sites) These can be random fragments of DNA or those corresponding to ESTs or other cdnas EXAMPLE: Duchenne Muscular Dystrophy Duchenne muscular dystrophy (DMD) is one of a group of muscular dystrophies characterized by enlargement of muscles. All are Y-linked and affect mainly males. "Dystrophy" refers to any of a number of disorders characterized by weakening, degeneration or abnormal development of muscle. Y chromosome

Genome projects Fluorescent in situ hybridisation

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Protein Protein interactions interactions Sequence Sequence comparison comparison Data Data mining mining

Predicting protein encoding genes Transcription: prerna Splicing: mrna (A) 200 Translation: Modification Protein

Three basic strategies to to find gene specific sequence motives Homology searching Analysis of of sequence signals Statistical analysis Whole genome comparison

Three basic strategies to to find gene specific sequence motives Homology searching Analysis of of sequence signals Statistical analysis Whole genome comparison Ideally, gene prediction tools should be be able to to identify and automatically annotate all all genes.

Recognition sites for gene regulation

Three basic strategies to to find gene specific sequence motives Homology searching Analysis of of sequence signals Statistical analysis Whole genome comparison

Genome comparison

Genomes

Bioinformatics Why Sequence Comparison? Evolutionary relationships paralog ancestor ortholog species 1 species 2 species 3

Homo sapiens chromosome X versus Mus musculus chromosome X

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Protein Protein interactions interactions Sequence Sequence comparison comparison Data Data mining mining

Identical genome Totally different proteome

Proteome Sequence the proteome Tissue Isolate proteins Run a 2 D SDS-PAGE Isolate single protein dots Sequence the protein

2-dimensional SDS-PAGE 1. Step: + -

2-dimensional SDS-PAGE 2. Step: - +

Proteome Tissue Identify a protein with mass spectrometry Isolate proteins Run a 2 D SDS-PAGE Isolate single protein dots Enzymatic digestion Peptide mass fingerprinting Database search

Proteome 5'UTR 3'UTR prerna: Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 ATG TAA mrna: Splicing / Polyadenylation ATG polya TAA AAAAAAAAA active protein: Translation CPLTW...GFL CPLTW...PJC Splice variant Posttranslationale Modification CPLTW...LAC

Proteome

Proteomics Ultimate goal of proteomics Identical expression pattern Receptor/ligand relationship Sequence identity

Novel definitions in biology Genome The complete set of chromosomes with the genes they contain It s more or less static information! Proteome All proteins encoded by the genome - Splice variants, - Post-translational modifications, - Polymorphismen, - Disease mutations, Proteomics

HPI Human Proteomics Initiative A major effort of the Swiss Institute of Bioinformatics (SIB) and the European Bioinformatics Institute (EBI) GOALS Annotation of all known human proteins (+ splice variants). Annotation of all known human polymorphisms and disease mutations. Annotation of all known post-translational modifications in human proteins. Tight links to structural information. Annotation of mammalian orthologs of human proteins.

Proteomics - Many interactions with other proteins and compounds - Changes of protein concentrations: - Subcellular localization - Time - Tissue - Developmental stages -..

Proteomics Goal Reconstructing molecular circuitry of a living cell. Techniques Molecular genetics: Gen expression: Micro arrays SAGE Protein analysis: 2-D gel electrophoresis Mass spectroscopy Protein interactions (peptide arrays, yeast two hybrid) And bioinformatics to integrate heterogenous data from different knowledge databases

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Protein Protein interactions interactions Sequence Sequence comparison comparison Data Data mining mining

High throughput! DNA Microarrays DNA microarrays are perfectly suited for comparing gene expression Different probes are compared to find, e.g. Tissue-specific Genes Regulatory Gene Defects in Cancer Medicine Disease related metabolic pathways Candidate genes Cellular Responses to the Environment

Microarray technique Affymetrix GeneChip or Spotted DNA Microarrays Two different cell samples RNA extraction and reverse transcription Labelling Hybridisation

Microarrays: a flood of data Data collection Arrays are scanned to extract signal intensities from the image. Normalization Data is calibrated e.g. by dividing RNA signal by genomic DNA signal. Clustering Bioinformatics methods identify groups of Up and down regulated genes. Annotation Bioinformatics methods / Data mining To get more information about the function and interaction of up and down regulated genes. Submission to public repositories

High throughput! Serial Analysis of Gene Expression (SAGE) AAAAAAA Isolate tissue specific RNA. 4 Nucleotides, RT, primer TTTTTT... Reverse transcribe to cdna TTTTTTTT cdna is linked to matrix via biotin/streptavidin. TTTTTTTT Digest with enzyme 1. TTTTTTTT Remove unbound fragments.

Serial Analysis of Gene Expression (SAGE) 14 bp Linker + RE TTTTTTTT Divide sample in two parts. Ligate two different linkers to the samples. 14 bp Linker + RE TTTTTTTT Digest with (type II) enzyme. Linker + RE Linker + RE Linker + RE Linker + RE Ligate and multiply/amplify with PCR, clone.

Serial Analysis of Gene Expression (SAGE) The result is a huge chain of 14 bp fragments. 14 bp The sequence of the concatemer is determined. Tumor cells copies 60 interleukin 93 actin 14 synthase 110 unknown Healthy cells copies 10 91 14 0 14 bp (4 14 possible combinations) are sufficient to characterize any individual RNA Determine the frequency of each transcript. Goal: identify novel genes involved in disease or investigate how known genes are regulated.

Transcriptome High throughput! Next to to determining the sequence of of the genome (DNA), many laboratories determine the sequence of of Expressed Sequence Tags (ESTs) Tissue Isolate RNA Reverse transcribe into DNA Sequence both ends Max 500bp The resulting sequences are ESTs

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Protein Protein interactions interactions Sequence Sequence comparison comparison Data Data mining mining

High throughput! Protein-protein interaction Peptide arrays Peptide chips provide new ways to study protein-protein interaction, unravel signal transduction pathways, perform multi-parameter diagnosis, study individual immunological repertoires, e.g. autoimmune reactions.

Protein-protein interaction Yeast two hybrid system

Overview Introduction Historical Historical overview overview Current Current questions questions in in science science Genome projects Proteomics Data analysis Current Current status status of of genome genome projects projects Sequencing Sequencing strategies strategies and and methods methods Strategies Strategies for for gene gene identification identification Proteome: Proteome: 2D 2D gels, gels, mass mass spectroscopy spectroscopy Gene Gene expression: expression: Microarrays, Microarrays, SAGE SAGE Protein Protein interactions interactions Sequence Sequence comparison comparison Data Data mining mining

Bioinformatics Current status of data analysis Scientific exploitation of molecular biology databases Database searches to find related sequences Pair-wise comparison of two sequences Alignment of multiple sequences Evolutionary analysis of molecular sequence data Analysis of protein secondary structure Analysis of RNA secondary structure Geneprediction... Thousands of software tools exist

Bioinformatics Data analysis Analyze one Sequence Compare Sequences e.g. Restriction maps e.g. Calculation of MW Structure prediction Gene prediction Database searches Assembling Sequence alignments Phylogeny Enter and Edit Sequences

Bioinformatics Why sequence comparison? Sequence comparison is often used: To find related genes in the database; When dealing with a sequence of unknown function the presence of similar domains implies similar function. Homologous sequences share the same ancestral sequence They can be ortholog or paralog

Bioinformatics Example: Blast2P Searching for homologous sequences Searching...done Sequences producing significant alignments: Score E (bits) Value >>>swissprot:25a1_mouse P11928 mus musculus (mouse). (2'-5')oli... 629 e-180 >>>swissprot:25a2_mouse P29080 mus musculus (mouse). (2'-5')oli... 495 e-140 >>>swissprot:25a2_human P04820 homo sapiens (human). (2'-5')oli... 495 e-140 >>>swissprot:25a1_human P00973 homo sapiens (human). (2'-5')oli... 495 e-140 >>>swissprot:25a3_mouse P29081 mus musculus (mouse). (2'-5')oli... 492 e-139 >>>swissprot:25a6_human P29728 homo sapiens (human). 69/71 kd (... 350 2e-96 >>>swissprot:tr14_human Q15646 homo sapiens (human). thyroid re... 77 4e-14 >>>swissprot:rn14_yeast P25298 saccharomyces cerevisiae (baker'... 32 1.5..

Bioinformatics Why sequence comparison? Evolution of genes and proteins Many proteins consist of many different domains which have specific functions Gene Protein Gene duplication Domain shuffling

Bioinformatics Why sequence comparison? Evolution of genes and proteins Gene duplication: Mostly pseudogenes (without function) or Similar gene product with new function (e.g. haemoglobin alpha, beta chain)

Bioinformatics Why sequence comparison? Evolution of genes and proteins Many genes and proteins are members of families which share a common biochemical function or evolutionary origin. Protein A Protein B Protein C Protein D1 Protein D2

Bioinformatics Why sequence comparison? Evolutionary relationships paralog ancestor ortholog species 1 species 2 species 3

Bioinformatics The birth of molecular evolution In In the the early early days, days, evolution was was studied studied by by comparison of of morphologic features In In the the 50s 50s and and 60s, 60s, the the protein protein sequences of of insulin insulin (Sanger), heamoglobins and and cytochrome c were were available and and sequence comparisons became possible.

Bioinformatics The birth of molecular evolution The phylogenetic tree of all cytochrome c proteins The phylogenetic tree of the species (organisms) Comparison... revealed a great overlap, supporting classical phylogeny At the same time, minor variations helped to improve existing trees

Gene prediction The Challenge The gap between data collection and data interpretation is is growing rapidly.

High-throughput data collection World wide collection of data Storage in databases Global efforts to collect: sequence data structure data protein expression profiles functional data metabolic pathways.. Data analysis Bioinformatics Data mining

The Biocomputing Service Group HUSAR Sequence Retrieval Analysis Packages BIOCCELERATOR EST CLUSTERING PHYLIP SRS STADEN Databases (EMBL, GENBANK, Swissprot, PIR, TRANSFAC,., Genome databases GDB/OMIM, Flybase, AceDB,...) Heidelberg UNIX Sequence Analysis Resources GCG / EGCG User Support Scientific Consulting, Training, Workshops, Hotline Hardware Environment Mapping Methods Linkage Package, Mapmaker, Crimap, Map, Pedpack, APM, LIPED, LDB, SIGMA

GCG (~130 programs) EGCG In-house developments - own programs - automated tasks EMBOSS (~150 programs) HUSAR Program Package Third-party Programs (~150 programs) DATABASES - >300 - Prompt updates (daily, weekly) SRS (Sequence Retrieval System)

Number of analysis programs is huge and must be combined for many purposes. Users need compact presentable reports on analysis results, especially for high throughput analysis

!" mapping in the human genome exhaustive gene structure analysis extraction of most recent annotation information merging with precomputed data from the NCBI pipeline

#$$!"

%&

'%! % %(' "!#$ % % & ' ( & % ) ) &) * + ' +, -

)$% (%"

%"*$"$% %%"%" %%% (%

+$("! "!#! $% http://genome.dkfz- &'! heidelberg.de