BIOINFORMATICS 1 INTRODUCTION TO MOLECULAR EVOLUTION EVOLUTION BY DESCENT WITH MODIFICATION DUALITY OF MOLECULAR EVOLUTION DNA AS A GENETIC MATERIAL

Size: px
Start display at page:

Download "BIOINFORMATICS 1 INTRODUCTION TO MOLECULAR EVOLUTION EVOLUTION BY DESCENT WITH MODIFICATION DUALITY OF MOLECULAR EVOLUTION DNA AS A GENETIC MATERIAL"

Transcription

1 INTRODUCTION TO BIOINFORMATICS 1 or why biologists need computers Prof. Dr. Wojciech Makałowski Institute of Bioinformatics MOLECULAR EVOLUTION 1 2 DUALITY OF MOLECULAR EVOLUTION EVOLUTION BY DESCENT WITH MODIFICATION Early stages of evolution - from primordial conditions to the first cell Evolution of the living organisms at the molecular level I should infer from analogy that probably all the organic beings which have ever lived on this earth have descended from some one primordial form, into which life was first breathed. On the Origin of Species, p. 484 Charles Darwin ( ) 3 4 MATERIAL BASIS OF GENETIC CONTINUITY DNA AS A GENETIC MATERIAL Germ plasm theory Germline is separated from soma Immortal germline passes genetic information from one generation to the next The germ cells are influenced neither by environmental influences nor by learning or morphological changes that happen during the lifetime of an organism, which information is lost after each generation. August Weismann ( ) Alfred Hershey ( ) Nobel Prize in Physiology or Medicine in 1969 Hershey Chase experiment (1952) 5 6

2 Hershey Chase experiment DISCOVERY OF DAN STRUCTURE Type a quote here. -Johnny Appleseed 7 8 CENTRAL DOGMA OF MOLECULAR BIOLOGY FIRST BOOK ON MOLECULAR EVOLUTION DNA transcription RNA translation Protein Francis Crick (1970) Nature, 227: CCTGAGCCAACTATTGATG CCUGAGCCAACUAUUGAUG PEPTID Christian Anfinsen ( ) The Molecular Basis of Evolution (1959) 9 10 GENETIC MATERIAL CHANGES OVER TIME MUTATIONAL CHANGES OF DNA SEQUENCES THISISANANCESTRALSEQUENCE Time THISISCOMPLETELYNEWSEQUENCE Small scale Substitutions Insertions Deletions Large scale Chromosomal rearrangements Gene duplications Transposable elements Inversions Horizontal gene transfer 11 12

3 SMALL SCALE MUTATIONS SMALL SCALE MUTATIONS AND THEIR CONSEQUENCES AND THEIR CONSEQUENCES Substitutions (synonymous) Substitutions (nonsynonymous) Deletion Insertion ACC TAC TTG CTG ACC TCT TTG CTG ACC TAC TGC TG ACC TAC TTT GCT G Thr Tyr Cys Thr Tyr Phe Ale Substitutions (nonsense) ACC TAA TTG CTG Thr Stop Inversion Frameshift Frameshift ACC TTT ATG CTG Thr Phe Met Leu LARGE SCALE MUTATIONS LARGE SCALE MUTATIONS CHROMOSOMAL REARRANGEMENTS CHROMOSOMAL REARRANGEMENTS Chromosomal translocation LARGE SCALE MUTATIONS LARGE SCALE MUTATIONS CHROMOSOMAL REARRANGEMENTS Unbalanced translocation CHROMOSOMAL REARRANGEMENTS Inversions 17 18

4 LARGE SCALE MUTATIONS TRANSPOSABLE ELEMENTS LARGE SCALE MUTATIONS HORIZONTAL GENE TRANSFER EVOLUTIONARY CHANGES OF AMINO ACID SEQUENCES EVOLUTIONARY CHANGES OF AMINO ACID SEQUENCES Human V-LSPADKTN VKAAWGKVGA HAGEYGAEAL ERMFLSFPTT KTYFPHF-DL SHGSAQVKGH Horse...A......S...G......G A. Cow...A...G....G..A Kangaroo...A...GH...I...G...A..G...T.H IQA. Newt MK..AE..H...TT.DHIKG.EEAL... F...T.L.A. R...AK....E..SFLHS. Carp S...DK..AA..I..A.ISP K.DDI... G..LTVY.Q....A.WA...P..GP..-. Human GKKVA-DALT NAVAHVDDMP NALSALSDLH AHKLRVDPVN FKLLSHCLLV TLAAHLPAEF Horse...G.. L..G.L..L. G...D..N S...V...ND. Cow.A...A... K..E.L..L. G...E S......S...SD. Kangaroo...I...G Q..E.I..L. GT..K F...GDA. Newt...M.G..S...I..ID A..CK...K..QD.M...A..PK.A.NI.. VMGI..K.HL Carp...IMG.VG D..SKI..LV GG.AS..E...S...A...I.ANHIV. GIMFY..GD. Human TPAVHASLDK FLASVSTVLT SKYR Horse.....S Cow......N Kangaroo..E......A Newt.YP..C.V....DV.GH Carp P.E..M.V...FQNLALA.S E... Human V-LSPADKTN VKAAWGKVGA HAGEYGAEAL ERMFLSFPTT KTYFPHF-DL SHGSAQVKGH 60 Horse...A......S...G......G A. Cow...A...G....G..A Kangaroo...A...GH...I...G...A..G...T.H IQA. Newt MK..AE..H...TT.DHIKG.EEAL... F...T.L.A. R...AK....E..SFLHS. Carp S...DK..AA..I..A.ISP K.DDI... G..LTVY.Q....A.WA...P..GP..-. Human GKKVA-DALT NAVAHVDDMP NALSALSDLH AHKLRVDPVN FKLLSHCLLV TLAAHLPAEF 120 Horse...G.. L..G.L..L. G...D..N S...V...ND. Cow.A...A... K..E.L..L. G...E S......S...SD. Kangaroo...I...G Q..E.I..L. GT..K F...GDA. Newt...M.G..S...I..ID A..CK...K..QD.M...A..PK.A.NI.. VMGI..K.HL Carp...IMG.VG D..SKI..LV GG.AS..E...S...A...I.ANHIV. GIMFY..GD. Human TPAVHASLDK FLASVSTVLT SKYR 144 Horse.....S Cow......N Kangaroo..E......A Newt.YP..C.V....DV.GH Carp P.E..M.V...FQNLALA.S E... p = nd/n Note: indels are excluded from calculation AMINO ACID DIFFERENCES BETWEEN HEMOGLOBIN α-chain GRADUAL UNDERESTIMATION OF THE REAL DISTANCE Human Horse Cow Kangaroo Newt Carp 1.5 Human Horse Cow Kangaroo Newt Carp Number of amino acid differences are presented above the diagonal and proportions of different amino acids (s) are presented below the diagonal. Number of substitutions per site Time in million years 23 24

5 POISSON CORRECTION TO THE RESCUE If r is the rate of aa substitution per year, the mean number of aa substitutions per site after t years is then rt, and the probability of occurrence of k aa substations at a given site (k = 0, 1, 2, ) is given by the following Poisson distribution. P(k;t) = e -rt (rt) k /k! Probability that no amino acid change has occurred at a given site: P(0;t) = e -rt (rt) 0 /0! = e -rt *1/1 = e -rt POISSON CORRECTION TO THE RESCUE Since we don t know the ancestral sequence, we compare two homologous sequences that diverged t years ago, the probability (q) that neither of the homologous sites has changed is: q = (e -rt ) 2 = e -2rt This probability can be estimated by q = 1 - p. If we use the equation above, the total number of amino acid substitution per site for the two sequences (d = 2rt) is given by d = -ln(1-p) AMINO ACID DIFFERENCES BETWEEN HEMOGLOBIN α-chain EVOLUTIONARY CHANGES OF NUCLEOTIDE SEQUENCES Human Horse Cow Kangaroo Newt Carp Human Horse Cow Kangaroo Newt Carp Poisson-correction (PC) distances are presented above the diagonal and proportions of different amino acids (s) are presented below the diagonal. More complicated than that of protein sequences Various types of DNA regions: protein-coding non-coding exons introns EVOLUTIONARY CHANGES OF NUCLEOTIDE SEQUENCES Alignment of the mitochondrial cytochrome b coding sequences Human CCAATACGCAAAATTAACCCCCTAATAAAATTAATTAACCACTCATTCATCGACCTCCCC Rhesus...TCC...AA.C...A...T.G...C...T..TT.A... Human ACCCCATCCAACATCTCCGCATGATGAAACTTCGGCTCACTCCTTGGCGCCTGCCTGATC Rhesus...C...C...ATG...T...T...CA...A..T Human CTCCAAATCACCACAGGACTATTCCTAGCCATGCACTACTCACCAGACGCCTCAACCGCC Rhesus T.A...T...C...C...A..A...A...CT... NUCLEOTIDE DIFFERENCES BETWEEN SEQUENCES p = nd/n Human TTTTCATCAATCGCCCACATCACTCGAGACGTAAATTATGGCTGAATCATCCGCTACCTT Rhesus..C..C...A..T...C...T...G..C..T...CT...C Human CACGCCAATGGCGCCTCAATATTCTTTATCTGCCTCTTCCTACACATCGGGCGAGGCCTA Rhesus...T...T...C...T...T...T Human TACTACACAATCAAAGACGCCCTCGGCTTACTTCTCTTCCTTCTCT---CCTTAATGACA Rhesus...AT...A AG..C...TTA..C..GCA... Number of different nucleotides between two sequences Total number of nucleotides examined 29 30

6 GRADUAL UNDERESTIMATION OF THE REAL DISTANCE DIFFERENT MODELS OF NUCLEOTIDE SUBSTITUTIONS Estimated number of substitutions per site (d) Expected number of substitutions per site (d) Jukes-Cantor model nucleotide substitution occurs at any nt site with equal frequency d = -(3/4)ln[1-(4/3)p] Kimura s two-parameter model assumes different rate of transitions and transversions d = -(1/2)ln(1-2P - Q) - (1/4)ln(1-2Q) transitions transvertions DIFFERENT MODELS OF NUCLEOTIDE SUBSTITUTIONS NT SUBSTITUTION ESTIMATES USING DIFFERENT METHODS Expected number of substitutions per site (d) Expected number of substitutions per site (d) Tamura-Nei Tamura Kimura-2P Jukes-Cantor Codon position Jukes- Cantor Kimura Tamura- Nei First Second Third Calculation based on 373 codons of human/rhesus mitochondrial cytochrome b gene AMINO ACID DIFFERENCES BETWEEN HEMOGLOBIN α-chain MOLECULAR CLOCK Human Horse Cow Kangaroo Newt Carp Human E. Zuckerkandl and L. Pauling Horse Cow Kangaroo Newt Carp Number of amino acid differences are presented above the diagonal and proportions of different amino acids (s) are presented below the diagonal

7 GLOBAL VS. LOCAL CLOCK Higher rates in rodents than in other mammals and primate slow down. Ohta, T. (1995) J. Mol. Evol. 40: NEUTRAL THEORY OF MOLECULAR EVOLUTION M. Kimura J.L. King and T. Jukes NEUTRAL THEORY - BASIC LAWS OF MOL. EVOLUTION For each protein, the rate of evolution in terms of amino acid substitution is approximately constant per year per site for various lines, as long as the function and tertiary structure of the molecule remain essentially unaltered. Cambridge University Press, NEUTRAL THEORY - BASIC LAWS OF MOL. EVOLUTION FUNCTIONALLY LESS IMPORTANT MOLECULES OR PARTS OF MOLECULE EVOLVE FASTER THAN MORE IMPORTANT ONES. Functionally less important molecules or parts of molecule evolve faster than more important ones. Those mutant substitutions that are less disruptive to the existing structure and function of a molecule (conservative substitutions) occur more frequently in evolution than more disruptive ones. Practical consequence: Parts of a genome evolving under constrain are likely to have a biological function in the genome

8 GENE DUPLICATION MUST ALWAYS PRECEDE OF THE EMERGENCE OF A GENE HAVING A NEW FUNCTION GENE DUPLICATION MUST ALWAYS PRECEDE OF THE EMERGENCE OF A GENE HAVING A NEW FUNCTION NEUTRAL THEORY - BASIC LAWS OF MOL. EVOLUTION Selective elimination of definitely deleterious mutants and random fixation of selectively neutral or very slightly deleterious mutants occur far more frequently in evolution than positive Darwinian selection of definitely advantageous mutants. Neutral mutations: s < 1/2N s - selection coefficient; N - effective population size PROBLEM WITH NEUTRAL MUTATION DEFINITION If a deleterious mutation with s = occurs in a population of N=10 6, s is much greater than 1/(2N) = 5 x Therefore, this mutation will not be called neutral. However, the fitness of mutant homozygotes will be lower than that of wild-type homozygotes only by In the case of brother-sister mating N = 2, so that even a semilethal mutation with s = will be called neutral. If this mutation is fixed in the population, the mutant homozygote has a fitness of 0.5 compared with the non-mutant homozygote NEARLY NEUTRAL EVOLUTION NEARLY NEUTRAL EVOLUTION purifying selection positive selection drift T. Ohta purifying selection drift&sel drift drift&sel positive selection 47 48

9 NT SUBSTITUTION ESTIMATES USING DIFFERENT METHODS SYNONYMOUS AND NONSYNONYMOUS CHANGES Codon position Jukes- Cantor Kimura Tamura- Nei First Second Third Calculation based on 373 codons of human/rhesus mitochondrial cytochrome b gene The third codon positions evolve much faster than the first and the second ones SYNONYMOUS AND NONSYNONYMOUS CHANGES HOW TO CALCULATE SYNONYMOUS AND NONSYNONYMOUS CHANGES? Very simple idea: just calculate ratio of nonsynonymous (dn) to synonymous (ds) substitution rates: ω = dn/ds Comparison of substitution rates in the first and the second codon position with the rate at the third codon position may be a good approximation: ω = [(d(1)+d(2))/2]/d(3) ω = 0 -> neutral evolution ω > 0 -> positive (Darwinian) selection ω < 0 -> negative (purifying) selection HOW TO CALCULATE SYNONYMOUS AND NONSYNONYMOUS CHANGES? HOW TO CALCULATE SYNONYMOUS AND NONSYNONYMOUS CHANGES? Codon position Jukes- Cantor Kimura Tamura- Nei First Second Third ω Calculation based on 373 codons of human/rhesus mitochondrial cytochrome b gene ω much lower than one indicating that cytochrome b is subjected to purifying selection However, this method is not very accurate: not all third codon position substitutions are synonymous and not all first codon position substitutions are nonsynonymous. For instance: TTT (Phe) <-> TTA (Leu) <-> CTA (Leu) Therefore, more accurate calculations are required but exact estimation of synonymous and nonsynonymous substitutions is not a trivial task

10 METHODS FOR ESTIMATING d N AND d S Evolutionary pathway Based on Kimura s 2-P model Likelihood with codon substitution models* Nei-Gojobori Li-Wu-Luo Goldman-Yang Modified NeiGojobori Pamilo-Bianchi-Li Nielsen-Yang ISOCHORES G. Bernardi Comeron Ina * implemented in codeml part of PAML package 55 Bernardi G PNAS 2007;104: Bernardi G PNAS 2007;104: THE NEW MUTATION THEORY OF EVOLUTION M. Nei Bernardi G PNAS 2007;104:

11 BIOINFORMATICS CREED Remember about biology Do not trust the data Use comparative approach Use statistics Know the limits Remember about biology!!! 61