Outline. Computational Genomics and Molecular Biology. Genes Encode Proteins. DNA forms a double stranded helix. DNA replication T C A G

Size: px
Start display at page:

Download "Outline. Computational Genomics and Molecular Biology. Genes Encode Proteins. DNA forms a double stranded helix. DNA replication T C A G"

Transcription

1 Computational Genomics and Molecular Biology Dannie Durand Fall 2004 Lecture 1 Genes Encode Proteins DNA forms a double stranded helix GTGCACCTGACTCCTGAG A gene is a DNA sequence V H L T P E A protein is an amino acid sequence A protein folds into a 3D structure A G T C DNA replication cell chromosome aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga DNA 1

2 DNA Protein Synthesis Protein Synthesis: Transcription DNA RNA RNA transcription RNA transcription GTGTCGCTGACTCCTTGGCTACCCGAG CACAGCGACTGAGGAACCGATGGGCTC GTGCAC double-stranded DNA sequence GTGCAC actgactccttggctacccgag GaTGTCGGTGCACCTGACTCCTGAG GTGCACCTGACTCCTGAG CACGTGGACTGAGGACTC CACaAGCa CACGTG agactgaggaaccgatgggctc open DNA helix RNA transcription RNA GTGCAC actgactccttggctacccgag GaTGTCGGTGCACCTGACTCCTGAG GTGCACCTGACTCCTGAG CACGTGGACTGAGGACTC CACaAGCaCUGACUCCUUGGCUACCCGAG CACGTG agactgaggaaccgatgggctc RNA transcript Adenine, Guanine, Cytosine, Uracil (AGCU) CUGACUCCUUGGCUACCCGAG GACTGAGGAACCGATGGGCTC Single stranded Secondary structure 2

3 RNA Secondary Structure Protein Synthesis: Translation CCGUGAACGUGUACCGGAUUUUUAUUCCC translation RNA Protein Translation amino acid sequence V H L T P transfer RNA E transfer RNA UGAGGA GUGCACCUGACUCCUGAG messenger RNA ribosome CUC secondary structure tertiary structure Protein Translation amino acid sequence V H L T CUC UGAGGA GUGCACCUGACUCCUGAG messenger RNA P transfer RNA E ribosome tertiary protein structrure Protein Translation amino acid sequence transfer RNA GUGCACCUGGUGCACCUGACUCCUGAGGUGCACCUGACUCCUGAGCUCCUGAG messenger RNA ribosome 3

4 Gene Regulation If I have the same set of genes in every cell, how come my liver cells look so different from my skin cells? Only a small number of genes are being translated into protein at any one time aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga protein coding sequence A gene is a location on a chromosome that encodes a protein Translating Genes into Proteins Bacteria Gene Regulation Controls When a Gene Is Transcribed RNA polymerase promoter gene transcription messenger RNA: RNA polymerase translation promoter gene amino acid sequence: Gene Regulation Controls When a Gene Is Transcribed Gene Regulation Controls When a Gene Is Transcribed tryptophan RNA polymerase tryptophan repressor repressor promoter gene promoter gene 4

5 Gene Regulation Controls When a Gene Is Transcribed tryptophan Translating Genes into Proteins: Multicellular organisms introns exon1 exon2 exon3 exon4 transcription repressor RNA transcript: RNA polymerase promoter gene mrna: RNA splicing translation amino acid sequence mrna: Alternate splice forms: Alternative splicing exon1 exon2 exon3 exon4 exon5 exon6 exon1 exon2 exon3 exon4 exon5 exon6 exon1 exon2 exon3 exon4 exon1 exon2 exon3 exon5 exon6 female male Gene Regulation In single cell organisms, gene regulation orchestrates Responses to changing environment Cell cycle In multicellular organisms, gene regulation orchestrates Tissue type differentiation Development from embryo to adulthood The Origins of Computational Biology This course Computational Structural Biology Sanger-Coulson sequencing Maxam-Gilbert sequencing Gilbert, Sanger win Nobel Prize Congress establishes Genbank Human Genome Project begins GenBank goes online ARPANET First royal USENET newgroups TCP/IP Internet World Wide Web, Gopher NCSA Mosaic Pizza Hut goes on line 5

6 Growth of sequence data during the 90 s Collins et al, Science, Oct 1998 National Center for Biotechnology Information Sequence Comparison Why sequence data is so powerful: Sequences are related! atgcaaggagtcccagagcctgagctgactacgt atgcgaggtctcccagtgtctgaactgactaagt global pairwise alignment atgcaag_cgtcccagtgccagaactccctacgt acc_gtggtctccgagtggctgaactgac_aaca local pairwise alignment early mammal atgccaggactcccagtga vtfisll vfrrdah ksevahrfkd lgeenfkalv vtfisll vfrreah kseiahrfnd vgeehfiglv vtfisll vfrrdty kseiahrfkd lgeqyfkglv vtlisfi lqrfardaeh kseiahrynd lkeetfkava global multiple alignment mkwvtfisll flfssaysrg vfrrdah kseva mkwvtfisll flfssaysrg vfrreah kseia ~~wvtfisll flfssaysrg vfrrdty kseia mkwvtlisfi flfssatsrn lqrfardaeh kseia local multiple alignment Applications Database searching structure Evolutionary tree reconstruction Gene finding Sequence assembly atgccaggactcccagtga atgcaaggagtcccagagc human atgcaaggagtcgcagagc atgcaaggagtcccagagc atgcgaggtctcccagtgt rat atgcgaggtctcgtagtgt atgcaaggagtcgcagagc atgcgaggtctcgtagtgt atgggaggtctcccagtgt atgccaggactcccagtga atgcgaggtctcccagtgt mouse atgggaggtctcccagtgt Sequence similarity => functional similarity LWDEFNQLGTE MIVTKAGRRMFP TFQVKLFGMDPM ADYMLLMDFVPV DDKRYRYAFHS BLAST LWDPTFQVEFNQLG TMFPTFEMIVTKAG RRMFPTFQVPTFQV KLFMFPTFGEMDPM ADYMMCFPTFLLMD FVPVDDKPTSFQVR O 6

7 Reconstructing Evolutionary History atgcaaggagtcccagagc gorilla atgcaaggagtcgcagagc early mammal atgccaggactcccagtga human atgcgaggtctcgtagtgt atgcaaggagtcgcagagc atgcaaggagtcgcagagc atgcgaggtctcgtagtgt atgcgaggtctcgtagtgt atgggaggtctcccagtgt atgggaggtctcccagtgt atgcgaggtctcccagtgt chimpanzee atgggaggtctcccagtgt Gene Recognition Problem Gene Recognition Problem aggaggcctataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactggccaggactcccagtga human aggaggactataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactgcgccaggactcgtagtga? cgacgccatataaattagtaatgtactatgggctggggcgtgatacgtacactgtggcctggtagctatgcagcacgtgtctagtga Mouse Gene Recognition Problem human aggaggactataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactgcgccaggactcgtagtga cgacgccatataaattagtaatgtactatgggctggggcgtgatacgtacactgtggcctggtagctatgcagcacgtgtctagtga Mouse 7

8 RNA Secondary Structure RNA Secondary Structure CCGUGUACGUGUAACGGAUCUUUAUUCCC CCGUGAACGUAUACUGGAGUUUUAGUCCG GCGUCACCGUGUUCCGGAUUAUGAUUCCC CCGAGAACGUCCACGGGAUUUCUGUUCCA RNA Secondary Structure CCGUGUACGUGUAACGGAUCUUUAUUCCC CCGUGAACGUAUACUGGAGUUUUAGUCCG GCGUCACCGUGUUCCGGAUUAUGAUUCCC CCGAGAACGUCCACGGGAUUUCUGUUCCA Structure Determination is Hard Predicting Protein Structure X-ray Diffraction Nuclear Magnetic Resonance Tertiary structure: Detailed physical models Find the configuration of minimum energy Secondary structure: eg, Which amino acids participate in a helix? Secondary structure motifs: eg, Is this a coiled-coil protein? Threading: Estimate protein structure from a related protein with known structure and similar sequence 8

9 Predicting Protein Structure Complete physical model of tertiary structure Predicting Protein Structure Tertiary structure: Detailed physical models Find the configuration of minimum energy Secondary structure: eg, Which amino acids participate in a helix? Secondary structure motifs: eg, Is this a coiled-coil protein? Threading: Estimate protein structure from a related protein with known structure and similar sequence Predicting Protein Structure Protein Secondary Structure Complete physical model of tertiary structure Too hard! Secondary structure: Does this amino acid participate in an alphahelix? alpha helix beta sheet Predicting Protein Structure Is this a coiled-coil protein? Tertiary structure: Detailed physical models Find the configuration of minimum energy Secondary structure: eg, Which amino acids participate in a helix? Secondary structure motifs: eg, Is this a coiled-coil protein? Threading: Estimate protein structure from a related protein with known structure and similar sequence 9

10 Predicting Protein Structure Threading Tertiary structure: Detailed physical models Find the configuration of minimum energy Secondary structure: eg, Which amino acids participate in a helix? Secondary structure motifs: eg, Is this a coiled-coil protein? Threading: Estimate protein structure from a related protein with known structure and similar sequence Structure? vkltpegtr_wgghpldekflske Estimate protein structure from a related protein with known structure and similar sequence vhltpettrgwgghmldekeiske Structure? Threading Structural Genomics A world-wide consortium to determine novel protein structures Which proteins are likely to be novel? vkltpegtr_wgghpldekflske vhltpettrgwgghmldekeiske Estimate protein structure from a related protein with known structure and similar sequence Pairwise sequence alignment (global and local) Multiple sequence alignment global local Substitution matrices Gene Finding Database searching BLAST Sequence statistics An overview of computational molecular biology Thurs: Sequencing technology and whole genome sequence assembly Next Tuesday: Evolutionary tree reconstruction Protein structure prediction RNA structure prediction 10