BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

Similar documents
Protein Synthesis. Application Based Questions

Lecture 19A. DNA computing

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

A Zero-Knowledge Based Introduction to Biology

Basic concepts of molecular biology

How life. constructs itself.

DNA.notebook March 08, DNA Overview

Basic concepts of molecular biology

Protein Synthesis: Transcription and Translation

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

The combination of a phosphate, sugar and a base forms a compound called a nucleotide.

Just one nucleotide! Exploring the effects of random single nucleotide mutations

Chemistry 121 Winter 17

Algorithms in Bioinformatics ONE Transcription Translation

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko

UNIT I RNA AND TYPES R.KAVITHA,M.PHARM LECTURER DEPARTMENT OF PHARMACEUTICS SRM COLLEGE OF PHARMACY KATTANKULATUR

Lezione 10. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

Deoxyribonucleic Acid DNA. Structure of DNA. Structure of DNA. Nucleotide. Nucleotides 5/13/2013

Biomolecules: lecture 6

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity

7.016 Problem Set 3. 1 st Pedigree

(a) Which enzyme(s) make 5' - 3' phosphodiester bonds? (c) Which enzyme(s) make single-strand breaks in DNA backbones?

Level 2 Biology, 2017

PROTEIN SYNTHESIS Study Guide

Describe the features of a gene which enable it to code for a particular protein.

iclicker Question #28B - after lecture Shown below is a diagram of a typical eukaryotic gene which encodes a protein: start codon stop codon 2 3

DNA is normally found in pairs, held together by hydrogen bonds between the bases

Mutations. Lecture 15

Mechanisms of Genetics

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

Disease and selection in the human genome 3

Biomolecules: lecture 6

CONVERGENT EVOLUTION. Def n acquisition of some biological trait but different lineages

ORFs and genes. Please sit in row K or forward

ENZYMES AND METABOLIC PATHWAYS

ANCIENT BACTERIA? 250 million years later, scientists revive life forms

Lecture 11: Gene Prediction

Det matematisk-naturvitenskapelige fakultet

The Molecule of Heredity. Chapter 12 (pg. 342)

Molecular Biology. Biology Review ONE. Protein Factory. Genotype to Phenotype. From DNA to Protein. DNA à RNA à Protein. June 2016

11 questions for a total of 120 points

FROM DNA TO GENETIC GENEALOGY Stephen P. Morse

Molecular Level of Genetics

Bioinformatics CSM17 Week 6: DNA, RNA and Proteins

Station 1: DNA Structure Use the figure above to answer each of the following questions. 1.This is the subunit that DNA is composed of. 2.

Unit 1. DNA and the Genome

Bi Lecture 3 Loss-of-function (Ch. 4A) Monday, April 8, 13

Honors packet Instructions

Fishy Amino Acid Codon. UUU Phe UCU Ser UAU Tyr UGU Cys. UUC Phe UCC Ser UAC Tyr UGC Cys. UUA Leu UCA Ser UAA Stop UGA Stop

NAME:... MODEL ANSWER... STUDENT NUMBER:... Maximum marks: 50. Internal Examiner: Hugh Murrell, Computer Science, UKZN

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Nucleic acid and protein Flow of genetic information

Chapter 13 From Genes to Proteins

Enduring Understanding

Basic Concepts of Human Genetics

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE

DNA Begins the Process

He who asks is a fool for five minutes, but he who does not ask remains a fool forever. Today. Admin Stuff. CSE527 Computational Biology


Codon Bias with PRISM. 2IM24/25, Fall 2007

Human Gene,cs 06: Gene Expression. Diversity of cell types. How do cells become different? 9/19/11. neuron

BIOLOGY. Monday 14 Mar 2016

Thr Gly Tyr. Gly Lys Asn


Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism

Project 07/111 Final Report October 31, Project Title: Cloning and expression of porcine complement C3d for enhanced vaccines

National PHL TB DST Reference Center PSQ Reporting Language Table of Contents

He who asks is a fool for five minutes, but he who does not ask remains a fool forever. Tonight. Admin Stuff. CSEP590A Computational Biology

Forensic Science: DNA Evidence Unit

Transcription & Translation Practice Examination

Using DNA sequence, distinguish species in the same genus from one another.

PROTEIN SYNTHESIS WHAT IS IT? HOW DOES IT WORK?

Today He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

Granby Transcription and Translation Services plc

p-adic GENETIC CODE AND ULTRAMETRIC BIOINFORMATION

Key Concept Translation converts an mrna message into a polypeptide, or protein.

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma.

1 Read through the following passage on the structure of DNA, then fill in the most appropriate word or words to complete the passage.

PRINCIPLES OF BIOINFORMATICS

He who asks is a fool for five minutes, but he who does not ask remains a fool forever. Today. Admin Stuff. CSE527 Computational Biology

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos

Understanding Genes & Mutations. John A Phillips III May 16, 2005

From Gene to Protein. How Genes Work

Molecular Genetics of Disease and the Human Genome Project

Physical Anthropology 1 Milner-Rose

Chapter 3: Information Storage and Transfer in Life

CHAPTER 12- RISE OF GENETICS I. DISCOVERY OF DNA A. GRIFFITH (1928) 11/15/2016

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.

Important points from last time

DNA stands for deoxyribose nucleic acid

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

What is necessary for life?

Transcription:

Rationale of Genetic Studies Some goals of genetic studies include: to identify the genetic causes of phenotypic variation develop genetic tests o benefits to individuals and to society are still uncertain drug development o finding genes responsible for a disease, or even a sub-type of disease, provides valuable insight into how pathways could be targeted for drug development o identify genetic profiles associated with adverse drug reaction Data Explosion! The amount of data available for use in genetic studies has exploded in the last decade. In the past few years we have seen the release of the first drafts of the 3 billion base pair human genome and the genomes of model organisms. In a recent build of the human genome, annotation data are available for approximately 32,000 genes with around 18,000 confirmed genes. The typical confirmed human gene has 12 exons of an average length of 236 base pairs each, separated by introns of an average length of 5,478 base pairs. In addition, data are being generated daily on sequence variation between populations. More and more data are becoming available that quantify the expression of these genes at the mrna and the protein level for a variety of tissues. As the genomes for more and more organisms are sequenced, we have unprecedented homology information between organisms. The Need for Experimental Design and Statistics With so much data and so many options, there is a pressing need for well-designed studies that incorporate genetic variation along with the corresponding accurate and efficient statistical methods. Our goal for the quarter will be to study potential designs that incorporate genetic data, learn the corresponding methods for analyzing data from these designs Our goals in these tasks will be to: o understand the basic idea of each type of study o know the assumptions each type of analysis depends on for validity o understand the limitations of different types of studies o learn how to correctly interpret study results 1

Some Basic Terminology I recommend Chapter 1 of the Sham text for a quick introduction to these fundamental concepts. Biologists distinguish two types of cells, eukaryotic cells and prokaryotic cells. Eukaryotic cells differ from prokaryotic cells in that eukaryotic cells contain many membrane bound organelles, small membrane-bound structures inside the cell that carry out specialized functions. In particular, eukaryotic cells have a nucleus. Human beings and probably any animal that you might think of are eukaryotes. Some bacteria are prokaryotes. The nucleus in a eukaryotic cell contains most of the genetic material of the cell (and therefore the organism); the genetic material is encoded in DNA, which is packaged into chromosomes. The centromere is the attachment site for the spindle fiber that moves the chromosome during cell devision. The centromere defines two arms of the chromosome, the short arm p and the long arm q. Chormosomes can be telocentric (centromere at the end), acrocentric (centromere near one end), or metacentric (centromere near the middle). Chromosomes come in pairs. Chromosomes within a pair carry the same set of genes and are called homologous. Chromosomes that carry different sets of genes are called nonhomologous. In humans, the pair that determines an individual s gender is called the sex chromosomes. All other chromosomes are referred to as autosomes. Every species has its own characteristic number of different chromosomes n. Humans have 23 pairs of chromosomes, 22 autosomes and 2 sex chromosomes. The autosomes are numbered 1-22 from largest to smallest (except #22 is actually slightly larger than #21). Therefore, there are 46 chomosomes in a human somatic cell. In humans, there are two sex chromosomes X and Y. Females have two X chromosomes and males have one X and one Y. The mechanism of sex determination is different in different species. 2

Mitosis is cell division that yields two identical diploid cells, which have two of each chromosome. Meiosis is a special type of cell division that happens in reproductive tissue yielding haploid cells (which have one of each chromosome) called gametes. In females, the gametes are the egg cells and in males the gametes are the sperm cells. Genetically, a chromosome is just a long string of DNA. DNA is a biochemical molecule, but quantitative scientists think of it more as information in some sense. We think of DNA as a long string of letters that come from a four-letter alphabet: A, T, G, C (Adenine, Thymine, Guanine, Cytosine). DNA is a double-stranded molecule, with each strand made up of A s, T s, G s, and C s. A very important property of DNA is complementary base pairing between the two strands (see the figure on the next-to-last page): A and T always pair and G and C always pair. Complementary base pairing means that each single strand of DNA contains all the information for recreating the full double-stranded molecule. DNA Molecule Cell Nucleus Chromosome Gene Nucleotides Some sub-strings of DNA encode a recipe. These substrings are genes. Specifically, a gene is a sequence of DNA that is transcribed into mrna (messenger RNA), which, in turn, is translated into protein. Proteins are strings of amino acids. There are twenty different amino acids. 3

The genetic code is the codebook that gives the correspondence between DNA and protein. Every triplet of DNA bases (a codon) corresponds to a specific amino acid, or else signals START or STOP. The genetic code is almost universal across species. Promoter Transcription Exons I II III Introns DNA I II III Splicing mrna I II III Exons Translation Protein Double-stranded DNA: 5...TGCATGCATGGTTGCA...3 Coding or sense strand 3...ACGTACGTACCAACGT...5 Template or anti-sense strand Transcription reads template strand from 3 to 5 to produce mrna mrna 5...GCAGCAGGGCA...3 Translation reads mrna from 5 to 3 to produce polypeptides N-terminal...Cys Met His Gly Cys...C-terminal 1. Note that the coding strand is the one that is not used in transcribing the mrna molecule. 2. In transcription, the template strand is read from the 3 to 5 direction to produce mrna. 3. In translation, the mrna is read from 5 to 3 to produce proteins. A specific location on a chromosome, for instance the location of a gene, a SNP (singlenucleotide polymorphism), or another genetic marker, is a locus (plural: loci). There can be more than one form of a locus. These forms are called alleles. When there is more than one allele at a locus, the locus is said to be polymorphic. 4

When two haploid gametes unite, the complete diploid number of chromosomes is reinstated. We see also that an individual has one chromosome of maternal origin and one chromosome of paternal origin. Thus for a given locus an individual will have one allele of maternal origin and one allele of paternal origin. These define an individual s genotype. If an individual has two copies of the same allele, then that individual is homozygous at that locus. If an individual has two different alleles at a locus, then s/he is heterozygous. Mendel s First Law states that the two members of a gene pair segregate (separate) from each other into the gametes, so that one-half of the gametes carry one member of the pair and the other one-half of the gametes carry the other member of the gene pair. Gregor Mendel conducted pioneering work in Genetics performing breeding experiments in plants. It is useful to consider some experiments similar to Mendel s to become proficient in the basic concepts of genetics. Here are some basic exercises that should help you master these background concepts. 1. It is known that about 22 percent of the double-stranded DNA of an organism consists of thymine. Can the other base percentages be determined? If so, what are they? If T is 22% then A must also be 22% due to complementary base pairing. This then accounts for 44% of the composition. C and G must then account for 56%, and since they must also be equal, each accounts for 28%. 2. Double stranded DNA with 300 nucleotide pairs has a base composition of A=0.32, G=0.18, C=0.18, and T=0.32. Assume that a single strand of RNA is transcribed from this gene. Can you determine, from the information given, the base composition of the RNA? If so, what is it? This cannot be determined from the information because the coding strand of DNA could have, for example, all A and G and the template strand could be entirely C and T, or vice versa. These are extreme cases, but show that the base composition of the RNA could vary wildly. 3. A certain DNA virus has a base ratio of (A+G)/(C+T)=0.85. Is this single- or doublestranded DNA? Explain. It must be single-stranded. Otherwise, the ratio would be 1. 5

4. Consider a DNA triplet pair: 3 GTC5 5 CAG3 where the top strand is the template strand that transcribes mrna. What is the amino acid does the triplet code for? We read the coding strand from 5 to 3 to see that the codon is CAG, which codes for Glutamine. 5. 5...TCGTTTAAGGGCTTGTGCGCCACGGAT...3 coding strand 3...AGCAAATTCCCGAACACGCGGTGCCTA...5 template strand 1 2 3 (a) What are the first three proteins in the sequence? Ser Phe Lys (b) A base is added as the result of exposure to acridine dye (this is called a frameshift mutation). At which position (2 or 3) would it likely have the most damaging effect on the gene product? Explain. Since translation happens in the 5 to 3 direction, an added base at position 2 is likely more damaging since this would affect more codons. (c) The base guanine is added at position 1. What effect would it have on the gene product? The new sequence would be: TCG TGT TAA, In mrna form: CG G AA which would code for: Ser Cys STOP Therefore, the second amino acid is Cys instead of Phe and translation stops prematurely. 6

The RNA Codons C A Phenylalanine (Phe) Second nucleotide C A G C Serine (Ser) A Tyrosine (Tyr) G Cysteine (Cys) C Phe CC Ser AC Tyr GC Cys C A Leucine (Leu) CA Ser AA STOP GA STOP A G Leu CG Ser AG STOP C Leucine (Leu) CC Proline (Pro) CA Histidine (His) GG Tryptophan (Trp) CG Arginine (Arg) CC Leu CCC Pro CAC His CGC Arg C CA Leu CCA Pro CAA Glutamine (Gln) CGA Arg CG Leu CCG Pro CAG Gln CGG Arg G A Isoleucine (Ile) AC Threonine (Thr) AA Asparagine (Asn) G A AG Serine (Ser) AC Ile ACC Thr AAC Asn AGC Ser C AA Ile ACA Thr AAA Lysine (Lys) AG Methionine (Met) or START G Valine Val AGA Arginine (Arg) ACG Thr AAG Lys AGG Arg G GC Alanine (Ala) GA Aspartic acid (Asp) GG Glycine (Gly) GC (Val) GCC Ala GAC Asp GGC Gly C G GAA Glutamic GA Val GCA Ala GGA Gly A acid (Glu) GG Val GCG Ala GAG Glu GGG Gly G A 7