Disease and selection in the human genome 3

Similar documents
Lecture 11: Gene Prediction

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

Codon Bias with PRISM. 2IM24/25, Fall 2007

Electronic Supplementary Information

Figure S1. Characterization of the irx9l-1 mutant. (A) Diagram of the Arabidopsis IRX9L gene drawn based on information from TAIR (the Arabidopsis

SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer

PROTEIN SYNTHESIS Study Guide

Lezione 10. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

Supplemental Data. mir156-regulated SPL Transcription. Factors Define an Endogenous Flowering. Pathway in Arabidopsis thaliana

Supplement 1: Sequences of Capture Probes. Capture probes were /5AmMC6/CTG TAG GTG CGG GTG GAC GTA GTC

Table S1. Bacterial strains (Related to Results and Experimental Procedures)

Supplementary Materials for

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE

II 0.95 DM2 (RPP1) DM3 (At3g61540) b

Thr Gly Tyr. Gly Lys Asn

Supplementary Information. Construction of Lasso Peptide Fusion Proteins

Add 5µl of 3N NaOH to DNA sample (final concentration 0.3N NaOH).

strain devoid of the aox1 gene [1]. Thus, the identification of AOX1 in the intracellular

Nucleic Acids Research

Supplemental Data. Distinct Pathways for snorna and mrna Termination

Search for and Analysis of Single Nucleotide Polymorphisms (SNPs) in Rice (Oryza sativa, Oryza rufipogon) and Establishment of SNP Markers

Supplemental Information. Target-Mediated Protection of Endogenous. MicroRNAs in C. elegans. Inventory of Supplementary Information

Protein Synthesis. Application Based Questions

Genomics and Gene Recognition Genes and Blue Genes

Chapter 13 Chromatin Structure and its Effects on Transcription

Engineering Escherichia coli for production of functionalized terpenoids using plant P450s

Codon bias and gene expression of mitochondrial ND2 gene in chordates

Interpretation of sequence results

Biomolecules: lecture 6

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

Supplementary Material and Methods

Worksheet: Mutations Practice

Porto, and ICBAS - Instituto de Ciências Biomédicas de Abel Salazar, Universidade do Porto, Porto, Portugal 2

Wet Lab Tutorial: Genelet Circuits

Primary structure of an extracellular matrix proteoglycan core protein deduced from cloned cdna

A high efficient electrochemiluminescence resonance energy. transfer system in one nanostructure: its application for

What would the characteristics of an ideal genetic

Shoshan, David H. MacLennan, and Donald S. Wood, which appeared in the August 1981 issue ofproc. NatL Acad. Sci. USA

Supplemental Data. Lin28 Mediates the Terminal Uridylation. of let-7 Precursor MicroRNA. Molecular Cell, Volume 32

Supporting Information. Trifluoroacetophenone-Linked Nucleotides and DNA for Studying of DNA-protein Interactions by 19 F NMR Spectroscopy

Supplementary Information

Best practices for Variant Calling with Pacific Biosciences data

DNA Begins the Process

Supporting Information

Diabetologia 9 Springer-Verlag 1992

sequence analysis the 5' 3,782 nucleotides of proviral MC29 DNA, including the Agag-myc transforming gene (5). The hybrid

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

High-throughput cloning and expression in recalcitrant bacteria

PILRα Is a Herpes Simplex Virus-1 Entry Coreceptor That Associates with Glycoprotein B

Nucleic Acids Research

Characterization and Derivation of the Gene Coding for Mitochondrial Carbamyl Phosphate Synthetase I of Rat*

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

MOLECULAR CLONING AND SEQUENCING OF FISH MPR 46

Sequence Design for DNA Computing

Meixia Li, Chao Cai, Juan Chen, Changwei Cheng, Guofu Cheng, Xueying Hu and Cuiping Liu

2.1 Calculate basic statistics

Identification, cloning, and nucleotide sequencing of the ornithine

BIOL591: Introduction to Bioinformatics Comparative genomes to look for genes responsible for pathogenesis

Engineering D66N mutant using quick change site directed mutagenesis. Harkewal Singh 09/01/2010

The HLA system. The Application of NGS to HLA Typing. Challenges in Data Interpretation

Mutagenesis. Classification of mutation. Spontaneous Base Substitution. Molecular Mutagenesis. Limits to DNA Pol Fidelity.

cdna Cloning of Porcine Transforming Growth mrnas

Amplified Analysis of DNA by the Autonomous Assembly of Polymers Consisting of DNAzyme Wires

Cloning and characterization of a cdna encoding phytoene synthase (PSY) in tea

Supplemental Data. Sheerin et al. (2015). Plant Cell /tpc h FR lifetime (ns)

The complete nucleotide sequence of the gene encoding the nontoxic component of Clostridium botulinum type E progenitor toxin

gingivalis prtt Gene, Coding for Protease Activity

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma.

SUPPLEMENTARY INFORMATION. doi: /nature08559

11 questions for a total of 120 points

Nucleotide Sequence of the Small Double-Stranded RNA Segment

Supplemental Data. Polymorphic Members of the lag Gene. Family Mediate Kin Discrimination. in Dictyostelium. Current Biology, Volume 19

Organization of the human a2-plasmin inhibitor gene (fibrinolysis/serine protease inhibitors/serpin gene superfamily/human genomic dones)

Electronic Supplementary Information

Antigenic Variation of Ehrlichia chaffeensis Resulting from Differential Expression of the 28-Kilodalton Protein Gene Family

CHAPTER II MATERIALS AND METHODS. Cell Culture and Plasmids. Cos-7 cells were maintained in Dulbecco s modified Eagle medium (DMEM,

Supplemental Data. Short Article. Transcriptional Regulation of Adipogenesis by KLF4. Kıvanç Birsoy, Zhu Chen, and Jeffrey Friedman

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

A Highly Conserved Brassica Gene with Homology to the S-Locus-Specific Glycoprotein Structural Gene

Cloning of cdnas Encoding Human Lysosomal Membrane Glycoproteins, h-lamp- 1 and h-lamp-2

Complete Sequence of the Rous Sarcoma Virus env Gene: Identification of Structural and Functional Regions of Its

Int J Clin Exp Med 2014;7(9): /ISSN: /IJCEM Jin Ah Ryuk *, Young Seon Kim *, Hye Won Lee, Byoung Seob Ko

Four different segments of a DNA molecule are represented below.

Nodeotlde sequence of tht tmr locus of Agmbacterium tumefaciens pti T37 T-DNA

approach is especially interesting as a way to expand the

SUMOstar Gene Fusion Technology

Chemical synthesis of the thymidylate synthase gene

product for genetic competence and sporulation resembles sensor protein m. embers of the bacterial two-component s gnal-transduction systems

Characterization of a Gene Encoding a DNA Binding Protein with Specificity for a Light-Responsive Element

Einführung in die Genetik

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Multiple Regulatory Pathways for a Single 1-Tubulin Polypeptide Isotype

Just one nucleotide! Exploring the effects of random single nucleotide mutations

Selection on Codon Usage for Error Minimization at the Protein Level

Protein Synthesis: Transcription and Translation

Mutations in the Human ATP-Binding Cassette Transporters ABCG5 and ABCG8 in Sitosterolemia

GENETICS and the DNA code NOTES

Einführung in die Genetik

Molecular cloning of four novel murine ribonuclease genes: unusual expansion within the Ribonuclease A gene family

Transcription:

Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward

RBFD: human populations, adaptation and immunity

Neandertal Museum, Mettman Germany Sequence genome Measure expression of innate immune cells in response to to pathogen proteins Bacterial antigen 1 Bacterial antigen 2 Viral antigen 1 Population A individual 1 GCCAACCGGAATGTGTA... TAGGAGAAGCGTAAG... Population A individual 2 ACCATCCGGAATGTGTA... TAGGAGAAGCGTAAG... Population B individual 1 ACCAACCGGAATGTGTA... TAGGAGAAGCGCAAG... Goal: identify SNPs which explain expression differences

Topics for today Ka/Ks in the human genome What s in the non protein coding part the genome?

Ka/Ks from the HIV unit: a brief review First letter T C A G Second letter T C A G TTT Phe F TCT Ser S TAT Tyr Y TGT Cys C TTC Phe F TCC Ser S TAC Tyr Y TGC Cys C TTA Leu L TCA Ser S TAA Stop TGA Stop TTG Leu L TCG Ser S TAG Stop TGG Trp W CTT Leu L CCT Pro P CAT His H CGT Arg R CTC Leu L CCC Pro P CAC His H CGC Arg R CTA Leu L CCA Pro P CAA Gln Q CGA Arg R CTG Leu L CCG Pro P CAG Gln Q CGG Arg R ATT Ile I ACT Thr T AAT Asn N AGT Ser S ATC Ile I ACC Thr T AAC Asn N AGC Ser S ATA Ile I ACA Thr T AAA Lys K AGA Arg R ATG Met M ACG Thr T AAG Lys K AGG Arg R GTT Val V GCT Ala A GAT Asp D GGT Gly G GTC Val V GCC Ala A GAC Asp D GGC Gly G GTA Val V GCA Ala A GAA Glu E GGA Gly G GTG Val V GCG Ala A GAG Glu E GGG Gly G Amino acid changing mutation example: CCC à ACC CCC à CCA (Pro à Thr) Synonymous mutation example: (Pro à Pro) Key insight: selection affects amino acid changing mutations but not synonymous ones

Ka/Ks from the HIV unit: an example scenario Ancestral codon known (e.g. from old samples) GTA (Valine) Descendent viral sequences TTA GTA GCA GTA GTA GTG GTT CTA We ll assume a star phylogeny

Ka/Ks from the HIV unit: doing the calculation TTA (aa) CTA (aa) GTA Average proportion of amino acid differences per amino acid changing site 3 16 GTT (syn) GCA (aa) Ancestor: GTA GTG (syn) GTA GTA 2 8 Average proportion of synonymous differences per synonymous site! " = 3 4 ln 1 4 3 3 16 = 0.216 Jukes-Cantor correction to get substitutions per site! / = 3 4 ln 1 4 3 2 8 = 0.304

Interpreting the result CTA (aa) TTA (aa) GTA!"!# = 0.216 0.304 = 0.711 GTT (syn) Ancestor: GTA GCA (aa) GTG (syn) GTA GTA Ka Ks 1 Ka Ks >1 Ka Ks <1 No selection Positive selection Purifying selection

Ka/Ks in the human genome: some adjustments Ancestor is unknown Data is alignment between two species Calculate for whole genes (or parts of genes) rather than single codons Human TTTTCTCACTGTTCTTTTTCTCAGCCTGTATTTCCATATTTAAATCCTAGAAAATGTGGAGTCCCCATGACTCTGTGCTCACCAAGCTCTTGA Marmoset TTTTCTAACTGTCATTTTTCTTATCCTGTATTTCCATATTTCAGTCCTATGACATGTGAATTACCCATGACTCTGTGCTCACCAAGCTCTTGA (partial TRIM5a alignment, human vs. marmoset)

Ka/Ks in a two species alignment Will calculate one Ka/Ks over whole length sp1 GGG ACT AAA sp2 GGA GCT AAA 1. Estimate the number of synonymous and amino acid changing sites 2. Count the number of synonymous and amino acid changing differences 3. Get proportion of differences for each, and correct with Jukes-Cantor

Estimating the number of synonymous and amino acid changing sites sp1 GGG ACT AAA sp2 GGA GCT AAA total aa sites sp1 2 2 2 ⅔ 6 ⅔ aa sites sp2 2 2 2 ⅔ 6 ⅔ Average number of amino acid sites: 6 ⅔ Synon sites sp1 1 1 ⅓ 2 ⅓ Synon sites sp2 1 1 ⅓ 2 ⅓ Average number of synonymous sites: 2 ⅓ First letter T C A G Second letter T C A G TTT Phe F TCT Ser S TAT Tyr Y TGT Cys C TTC Phe F TCC Ser S TAC Tyr Y TGC Cys C TTA Leu L TCA Ser S TAA Stop TGA Stop TTG Leu L TCG Ser S TAG Stop TGG Trp W CTT Leu L CCT Pro P CAT His H CGT Arg R CTC Leu L CCC Pro P CAC His H CGC Arg R CTA Leu L CCA Pro P CAA Gln Q CGA Arg R CTG Leu L CCG Pro P CAG Gln Q CGG Arg R ATT Ile I ACT Thr T AAT Asn N AGT Ser S ATC Ile I ACC Thr T AAC Asn N AGC Ser S ATA Ile I ACA Thr T AAA Lys K AGA Arg R ATG Met M ACG Thr T AAG Lys K AGG Arg R GTT Val V GCT Ala A GAT Asp D GGT Gly G GTC Val V GCC Ala A GAC Asp D GGC Gly G GTA Val V GCA Ala A GAA Glu E GGA Gly G GTG Val V GCG Ala A GAG Glu E GGG Gly G

Counting the number of synonymous and amino acid changing differences Amino acid changing differences sp1 GGG ACT AAA sp2 GGA GCT AAA total 0 1 0 1 Synonymous differences 1 0 0 1 For simplicity we ll assume there is at most 1 difference per codon First letter T C A G Second letter T C A G TTT Phe F TCT Ser S TAT Tyr Y TGT Cys C TTC Phe F TCC Ser S TAC Tyr Y TGC Cys C TTA Leu L TCA Ser S TAA Stop TGA Stop TTG Leu L TCG Ser S TAG Stop TGG Trp W CTT Leu L CCT Pro P CAT His H CGT Arg R CTC Leu L CCC Pro P CAC His H CGC Arg R CTA Leu L CCA Pro P CAA Gln Q CGA Arg R CTG Leu L CCG Pro P CAG Gln Q CGG Arg R ATT Ile I ACT Thr T AAT Asn N AGT Ser S ATC Ile I ACC Thr T AAC Asn N AGC Ser S ATA Ile I ACA Thr T AAA Lys K AGA Arg R ATG Met M ACG Thr T AAG Lys K AGG Arg R GTT Val V GCT Ala A GAT Asp D GGT Gly G GTC Val V GCC Ala A GAC Asp D GGC Gly G GTA Val V GCA Ala A GAA Glu E GGA Gly G GTG Val V GCG Ala A GAG Glu E GGG Gly G

Ka 1 6 ⅔ Average proportion of amino acid differences per amino acid changing site " / = 3 4 ln 1 4 3 1 6 ⅔ = 0.167 Amino acid changing substitution rate Ks 1 2 ⅓ Average proportion of synonymous differences per synonymous site " # = 3 4 ln 1 4 3 1 2 ⅓ = 0.635 Synonymous substitution rate

Ka/Ks!"!# = 0.167 0.635 = 0. 263 Ka Ks 1 Ka Ks >1 Ka Ks <1 No selection Positive selection Purifying selection

Worksheet Calculate the Ka/Ks ratio over the following region: sp1 GTA CCC sp2 CTA CCA (Rip it off from the back of your packet) First letter T C A G TTT Phe TTC Phe TTA Leu Name: T C A G F F L TTG Leu L CTT Leu L CTC Leu L CTA Leu L CTG Leu L ATT Ile I ATC Ile I ATA Ile I ATG Met M GTT Val V GTC Val V GTA Val V GTG Val V TCT Ser S TCC Ser S TCA Ser S TCG Ser S CCT Pro P CCC Pro P CCA Pro P CCG Pro P ACT Thr T ACC Thr T ACA Thr T ACG Thr T GCT Ala A GCC Ala A GCA Ala A GCG Ala A TAT Tyr Y TAC Tyr Y TAA Stop TAG Stop CAT His H CAC His H CAA Gln Q CAG Gln Q AAT Asn N AAC Asn N AAA Lys K AAG Lys K GAT Asp D GAC Asp D GAA Glu E GAG Glu E TGT Cys C TGC Cys C TGA Stop TGG Trp W CGT Arg R CGC Arg R CGA Arg R CGG Arg R AGT Ser S AGC Ser S AGA Arg R AGG Arg R GGT Gly G GGC Gly GGA Gly GGG Gly G G G

Worksheet Calculate the Ka/Ks ratio over the following region: sp1 GTA CCC sp2 CTA CCA aa sites sp1 2 2 4 aa sites sp2 1 ⅔ 2 3 ⅔ Average number of amino acid sites: 3 ⅚ Synon sites sp1 1 1 2 Synon sites sp2 1 ⅓ 1 2 ⅓ Average number of synonymous sites: 2 ⅙ Amino acid changing differences 1 0 1 Synonymous differences 0 1 1 (Rip it off from the back of your packet) First letter 1 Aa diffs / aa sites = 3 ⅚ T C TTT Phe TTC Phe TTA Leu 1 Syn diffs / syn sites = 2 ⅙ A G Name: T C A G F F L TTG Leu L CTT Leu L CTC Leu L CTA Leu L CTG Leu L ATT Ile I ATC Ile I ATA Ile I ATG Met M GTT Val V GTC Val V GTA Val V GTG Val V TCT Ser S TCC Ser S TCA Ser S TCG Ser S CCT Pro P CCC Pro P CCA Pro P CCG Pro P ACT Thr T ACC Thr T ACA Thr T ACG Thr T GCT Ala A GCC Ala A GCA Ala A GCG Ala A TAT Tyr Y TAC Tyr Y TAA Stop TAG Stop CAT His H CAC His H CAA Gln Q CAG Gln Q AAT Asn N AAC Asn N AAA Lys K AAG Lys K GAT Asp D GAC Asp D GAA Glu E GAG Glu E TGT Cys C TGC Cys C TGA Stop TGG Trp W CGT Arg R CGC Arg R CGA Arg R CGG Arg R AGT Ser S AGC Ser S AGA Arg R AGG Arg R GGT Gly G GGC Gly GGA Gly GGG Gly! " = 3 4 ln 1 4 3 1 3 ⅚ = 0.321!. = 3 4 ln 1 4 3 1 2 ⅙ = 0.717! "!. = 0.321 0.717 = 0.447 G G G

Most of the human genome is under purifying selection Mouse genome paper Identified 12,845 ortholog pairs Median Ka/Ks = 0.115

TRIM5a: an example of positive selection in the human genome TRIM5a

TRIM5a: an example of positive selection in the human genome Human TTTTCTCACTGTTCTTTTTCTCAGCCTGTATTTCCATATTTAAATCCTAGAAAATGTGGAGTCCCCATGACTCTGTGCTCACCAAGCTCTTGA Marmoset TTTTCTAACTGTCATTTTTCTTATCCTGTATTTCCATATTTCAGTCCTATGACATGTGAATTACCCATGACTCTGTGCTCACCAAGCTCTTGA (partial TRIM5a alignment, human vs. marmoset)!"!# > 1.1 http://www.pnas.org/content/102/8/2832.full

Topics for today Ka/Ks in the human genome What s in the non protein coding part the genome?

What s in the non protein coding part of the genome?

Reverse transcriptase! HIV1 reverse transcriptase 834 hits! Similarity search against human genome (HIV negative person) http://commons.wikimedia.org/wiki/file:nhgri_human_male_karyotype.png

Matches for RT are not inside genes

A genomic parasite (or transposon) replicating Individual human cell

If replication occurs in a reproductive cell it can be passed to subsequent generations Insertions now represent a new mutation in the human population.

Genomes full of parasites genome transposon content chicken 8.5% mouse 38% human 46% wheat 68%

How a LINE transposon works RNA intermediate Host encoded rna polymerase Host encoded ribosome LINE encoded endonuclease / reverse transcriptase together with RNA intermediate Reverse transcriptase copies RNA into DNA and puts in new location Endonuclease makes DNA break in new location

SINEs parasitize LINEs RNA intermediate SINE RNA does not code for protein. Hijacks LINE endonuclease / reverse transcriptase

Most transposon insertions are neutral

Occasionally transposon insertions are deleterious: hemophilia example Normal allele Disease allele with SINE insertion Blowup of 14 th exon F8 gene codes for blood coagulation factor SINE insertion causes premature stop codon Homozygotes for disease allele have hemophilia

Very occasionally transposon insertions can be beneficial......... VDJ recombination Transcription + translation RAG

RAG originated from a DNA editing enzyme carried by a transposon Kapitonov VV, Jurka J (2005) RAG1 Core and V(D)J Recombination Signal Sequences Were Derived from Transib Transposons. PLoS Biol 3(6): e181. doi:10.1371/journal.pbio.0030181 https://commons.wikimedia.org/wiki/file:archaeology.rome.arp.jpg http://journals.plos.org/plosbiology/article?id=info:doi/10.1371/journal.pbio.0030181

Consider a LINE transposon insertion into a non-functional region of the genome. This insertion occurred before the divergence of human and mouse. Imagine we obtain the sequence for this transposon from human and mouse, and create an alignment between the reverse transcriptase DNA sequence in each. What would you expect the Ka/Ks ratio to be in this sequence? Explain. Ancestral transposon insertion Human Mouse

Consider a LINE transposon insertion into a non-functional region of the genome. This insertion occurred before the divergence of human and mouse. Imagine we obtain the sequence for this transposon from human and mouse, and create an alignment between the reverse transcriptase DNA sequence in each. What would you expect the Ka/Ks ratio to be in this sequence? Explain. Ancestral transposon insertion Human Mouse An insertion in a non-functional region is likely to be neither deleterious or advantageous. Thus we would expect no selection at all, and a Ka/Ks ratio of about 1.

Practical uses of transposons: provide neutral sequence as a baseline for comparative studies

Hand in your worksheet please! (and be sure you put your full name on it)