Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward
RBFD: human populations, adaptation and immunity
Neandertal Museum, Mettman Germany Sequence genome Measure expression of innate immune cells in response to to pathogen proteins Bacterial antigen 1 Bacterial antigen 2 Viral antigen 1 Population A individual 1 GCCAACCGGAATGTGTA... TAGGAGAAGCGTAAG... Population A individual 2 ACCATCCGGAATGTGTA... TAGGAGAAGCGTAAG... Population B individual 1 ACCAACCGGAATGTGTA... TAGGAGAAGCGCAAG... Goal: identify SNPs which explain expression differences
Topics for today Ka/Ks in the human genome What s in the non protein coding part the genome?
Ka/Ks from the HIV unit: a brief review First letter T C A G Second letter T C A G TTT Phe F TCT Ser S TAT Tyr Y TGT Cys C TTC Phe F TCC Ser S TAC Tyr Y TGC Cys C TTA Leu L TCA Ser S TAA Stop TGA Stop TTG Leu L TCG Ser S TAG Stop TGG Trp W CTT Leu L CCT Pro P CAT His H CGT Arg R CTC Leu L CCC Pro P CAC His H CGC Arg R CTA Leu L CCA Pro P CAA Gln Q CGA Arg R CTG Leu L CCG Pro P CAG Gln Q CGG Arg R ATT Ile I ACT Thr T AAT Asn N AGT Ser S ATC Ile I ACC Thr T AAC Asn N AGC Ser S ATA Ile I ACA Thr T AAA Lys K AGA Arg R ATG Met M ACG Thr T AAG Lys K AGG Arg R GTT Val V GCT Ala A GAT Asp D GGT Gly G GTC Val V GCC Ala A GAC Asp D GGC Gly G GTA Val V GCA Ala A GAA Glu E GGA Gly G GTG Val V GCG Ala A GAG Glu E GGG Gly G Amino acid changing mutation example: CCC à ACC CCC à CCA (Pro à Thr) Synonymous mutation example: (Pro à Pro) Key insight: selection affects amino acid changing mutations but not synonymous ones
Ka/Ks from the HIV unit: an example scenario Ancestral codon known (e.g. from old samples) GTA (Valine) Descendent viral sequences TTA GTA GCA GTA GTA GTG GTT CTA We ll assume a star phylogeny
Ka/Ks from the HIV unit: doing the calculation TTA (aa) CTA (aa) GTA Average proportion of amino acid differences per amino acid changing site 3 16 GTT (syn) GCA (aa) Ancestor: GTA GTG (syn) GTA GTA 2 8 Average proportion of synonymous differences per synonymous site! " = 3 4 ln 1 4 3 3 16 = 0.216 Jukes-Cantor correction to get substitutions per site! / = 3 4 ln 1 4 3 2 8 = 0.304
Interpreting the result CTA (aa) TTA (aa) GTA!"!# = 0.216 0.304 = 0.711 GTT (syn) Ancestor: GTA GCA (aa) GTG (syn) GTA GTA Ka Ks 1 Ka Ks >1 Ka Ks <1 No selection Positive selection Purifying selection
Ka/Ks in the human genome: some adjustments Ancestor is unknown Data is alignment between two species Calculate for whole genes (or parts of genes) rather than single codons Human TTTTCTCACTGTTCTTTTTCTCAGCCTGTATTTCCATATTTAAATCCTAGAAAATGTGGAGTCCCCATGACTCTGTGCTCACCAAGCTCTTGA Marmoset TTTTCTAACTGTCATTTTTCTTATCCTGTATTTCCATATTTCAGTCCTATGACATGTGAATTACCCATGACTCTGTGCTCACCAAGCTCTTGA (partial TRIM5a alignment, human vs. marmoset)
Ka/Ks in a two species alignment Will calculate one Ka/Ks over whole length sp1 GGG ACT AAA sp2 GGA GCT AAA 1. Estimate the number of synonymous and amino acid changing sites 2. Count the number of synonymous and amino acid changing differences 3. Get proportion of differences for each, and correct with Jukes-Cantor
Estimating the number of synonymous and amino acid changing sites sp1 GGG ACT AAA sp2 GGA GCT AAA total aa sites sp1 2 2 2 ⅔ 6 ⅔ aa sites sp2 2 2 2 ⅔ 6 ⅔ Average number of amino acid sites: 6 ⅔ Synon sites sp1 1 1 ⅓ 2 ⅓ Synon sites sp2 1 1 ⅓ 2 ⅓ Average number of synonymous sites: 2 ⅓ First letter T C A G Second letter T C A G TTT Phe F TCT Ser S TAT Tyr Y TGT Cys C TTC Phe F TCC Ser S TAC Tyr Y TGC Cys C TTA Leu L TCA Ser S TAA Stop TGA Stop TTG Leu L TCG Ser S TAG Stop TGG Trp W CTT Leu L CCT Pro P CAT His H CGT Arg R CTC Leu L CCC Pro P CAC His H CGC Arg R CTA Leu L CCA Pro P CAA Gln Q CGA Arg R CTG Leu L CCG Pro P CAG Gln Q CGG Arg R ATT Ile I ACT Thr T AAT Asn N AGT Ser S ATC Ile I ACC Thr T AAC Asn N AGC Ser S ATA Ile I ACA Thr T AAA Lys K AGA Arg R ATG Met M ACG Thr T AAG Lys K AGG Arg R GTT Val V GCT Ala A GAT Asp D GGT Gly G GTC Val V GCC Ala A GAC Asp D GGC Gly G GTA Val V GCA Ala A GAA Glu E GGA Gly G GTG Val V GCG Ala A GAG Glu E GGG Gly G
Counting the number of synonymous and amino acid changing differences Amino acid changing differences sp1 GGG ACT AAA sp2 GGA GCT AAA total 0 1 0 1 Synonymous differences 1 0 0 1 For simplicity we ll assume there is at most 1 difference per codon First letter T C A G Second letter T C A G TTT Phe F TCT Ser S TAT Tyr Y TGT Cys C TTC Phe F TCC Ser S TAC Tyr Y TGC Cys C TTA Leu L TCA Ser S TAA Stop TGA Stop TTG Leu L TCG Ser S TAG Stop TGG Trp W CTT Leu L CCT Pro P CAT His H CGT Arg R CTC Leu L CCC Pro P CAC His H CGC Arg R CTA Leu L CCA Pro P CAA Gln Q CGA Arg R CTG Leu L CCG Pro P CAG Gln Q CGG Arg R ATT Ile I ACT Thr T AAT Asn N AGT Ser S ATC Ile I ACC Thr T AAC Asn N AGC Ser S ATA Ile I ACA Thr T AAA Lys K AGA Arg R ATG Met M ACG Thr T AAG Lys K AGG Arg R GTT Val V GCT Ala A GAT Asp D GGT Gly G GTC Val V GCC Ala A GAC Asp D GGC Gly G GTA Val V GCA Ala A GAA Glu E GGA Gly G GTG Val V GCG Ala A GAG Glu E GGG Gly G
Ka 1 6 ⅔ Average proportion of amino acid differences per amino acid changing site " / = 3 4 ln 1 4 3 1 6 ⅔ = 0.167 Amino acid changing substitution rate Ks 1 2 ⅓ Average proportion of synonymous differences per synonymous site " # = 3 4 ln 1 4 3 1 2 ⅓ = 0.635 Synonymous substitution rate
Ka/Ks!"!# = 0.167 0.635 = 0. 263 Ka Ks 1 Ka Ks >1 Ka Ks <1 No selection Positive selection Purifying selection
Worksheet Calculate the Ka/Ks ratio over the following region: sp1 GTA CCC sp2 CTA CCA (Rip it off from the back of your packet) First letter T C A G TTT Phe TTC Phe TTA Leu Name: T C A G F F L TTG Leu L CTT Leu L CTC Leu L CTA Leu L CTG Leu L ATT Ile I ATC Ile I ATA Ile I ATG Met M GTT Val V GTC Val V GTA Val V GTG Val V TCT Ser S TCC Ser S TCA Ser S TCG Ser S CCT Pro P CCC Pro P CCA Pro P CCG Pro P ACT Thr T ACC Thr T ACA Thr T ACG Thr T GCT Ala A GCC Ala A GCA Ala A GCG Ala A TAT Tyr Y TAC Tyr Y TAA Stop TAG Stop CAT His H CAC His H CAA Gln Q CAG Gln Q AAT Asn N AAC Asn N AAA Lys K AAG Lys K GAT Asp D GAC Asp D GAA Glu E GAG Glu E TGT Cys C TGC Cys C TGA Stop TGG Trp W CGT Arg R CGC Arg R CGA Arg R CGG Arg R AGT Ser S AGC Ser S AGA Arg R AGG Arg R GGT Gly G GGC Gly GGA Gly GGG Gly G G G
Worksheet Calculate the Ka/Ks ratio over the following region: sp1 GTA CCC sp2 CTA CCA aa sites sp1 2 2 4 aa sites sp2 1 ⅔ 2 3 ⅔ Average number of amino acid sites: 3 ⅚ Synon sites sp1 1 1 2 Synon sites sp2 1 ⅓ 1 2 ⅓ Average number of synonymous sites: 2 ⅙ Amino acid changing differences 1 0 1 Synonymous differences 0 1 1 (Rip it off from the back of your packet) First letter 1 Aa diffs / aa sites = 3 ⅚ T C TTT Phe TTC Phe TTA Leu 1 Syn diffs / syn sites = 2 ⅙ A G Name: T C A G F F L TTG Leu L CTT Leu L CTC Leu L CTA Leu L CTG Leu L ATT Ile I ATC Ile I ATA Ile I ATG Met M GTT Val V GTC Val V GTA Val V GTG Val V TCT Ser S TCC Ser S TCA Ser S TCG Ser S CCT Pro P CCC Pro P CCA Pro P CCG Pro P ACT Thr T ACC Thr T ACA Thr T ACG Thr T GCT Ala A GCC Ala A GCA Ala A GCG Ala A TAT Tyr Y TAC Tyr Y TAA Stop TAG Stop CAT His H CAC His H CAA Gln Q CAG Gln Q AAT Asn N AAC Asn N AAA Lys K AAG Lys K GAT Asp D GAC Asp D GAA Glu E GAG Glu E TGT Cys C TGC Cys C TGA Stop TGG Trp W CGT Arg R CGC Arg R CGA Arg R CGG Arg R AGT Ser S AGC Ser S AGA Arg R AGG Arg R GGT Gly G GGC Gly GGA Gly GGG Gly! " = 3 4 ln 1 4 3 1 3 ⅚ = 0.321!. = 3 4 ln 1 4 3 1 2 ⅙ = 0.717! "!. = 0.321 0.717 = 0.447 G G G
Most of the human genome is under purifying selection Mouse genome paper Identified 12,845 ortholog pairs Median Ka/Ks = 0.115
TRIM5a: an example of positive selection in the human genome TRIM5a
TRIM5a: an example of positive selection in the human genome Human TTTTCTCACTGTTCTTTTTCTCAGCCTGTATTTCCATATTTAAATCCTAGAAAATGTGGAGTCCCCATGACTCTGTGCTCACCAAGCTCTTGA Marmoset TTTTCTAACTGTCATTTTTCTTATCCTGTATTTCCATATTTCAGTCCTATGACATGTGAATTACCCATGACTCTGTGCTCACCAAGCTCTTGA (partial TRIM5a alignment, human vs. marmoset)!"!# > 1.1 http://www.pnas.org/content/102/8/2832.full
Topics for today Ka/Ks in the human genome What s in the non protein coding part the genome?
What s in the non protein coding part of the genome?
Reverse transcriptase! HIV1 reverse transcriptase 834 hits! Similarity search against human genome (HIV negative person) http://commons.wikimedia.org/wiki/file:nhgri_human_male_karyotype.png
Matches for RT are not inside genes
A genomic parasite (or transposon) replicating Individual human cell
If replication occurs in a reproductive cell it can be passed to subsequent generations Insertions now represent a new mutation in the human population.
Genomes full of parasites genome transposon content chicken 8.5% mouse 38% human 46% wheat 68%
How a LINE transposon works RNA intermediate Host encoded rna polymerase Host encoded ribosome LINE encoded endonuclease / reverse transcriptase together with RNA intermediate Reverse transcriptase copies RNA into DNA and puts in new location Endonuclease makes DNA break in new location
SINEs parasitize LINEs RNA intermediate SINE RNA does not code for protein. Hijacks LINE endonuclease / reverse transcriptase
Most transposon insertions are neutral
Occasionally transposon insertions are deleterious: hemophilia example Normal allele Disease allele with SINE insertion Blowup of 14 th exon F8 gene codes for blood coagulation factor SINE insertion causes premature stop codon Homozygotes for disease allele have hemophilia
Very occasionally transposon insertions can be beneficial......... VDJ recombination Transcription + translation RAG
RAG originated from a DNA editing enzyme carried by a transposon Kapitonov VV, Jurka J (2005) RAG1 Core and V(D)J Recombination Signal Sequences Were Derived from Transib Transposons. PLoS Biol 3(6): e181. doi:10.1371/journal.pbio.0030181 https://commons.wikimedia.org/wiki/file:archaeology.rome.arp.jpg http://journals.plos.org/plosbiology/article?id=info:doi/10.1371/journal.pbio.0030181
Consider a LINE transposon insertion into a non-functional region of the genome. This insertion occurred before the divergence of human and mouse. Imagine we obtain the sequence for this transposon from human and mouse, and create an alignment between the reverse transcriptase DNA sequence in each. What would you expect the Ka/Ks ratio to be in this sequence? Explain. Ancestral transposon insertion Human Mouse
Consider a LINE transposon insertion into a non-functional region of the genome. This insertion occurred before the divergence of human and mouse. Imagine we obtain the sequence for this transposon from human and mouse, and create an alignment between the reverse transcriptase DNA sequence in each. What would you expect the Ka/Ks ratio to be in this sequence? Explain. Ancestral transposon insertion Human Mouse An insertion in a non-functional region is likely to be neither deleterious or advantageous. Thus we would expect no selection at all, and a Ka/Ks ratio of about 1.
Practical uses of transposons: provide neutral sequence as a baseline for comparative studies
Hand in your worksheet please! (and be sure you put your full name on it)