Linkage Analysis Computa.onal Genomics Seyoung Kim

Similar documents
Human Genome Most human cells contain 46 chromosomes: Statistical Human Genetics Linkage and Association Haplotyping algorithms.

Human linkage analysis. fundamental concepts

Answers to additional linkage problems.

LECTURE 5: LINKAGE AND GENETIC MAPPING

LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Figure 1: Testing the CIT: T.H. Morgan s Fruit Fly Mating Experiments

Chapter 14: Mendel and the Gene Idea

An introduction to genetics and molecular biology

Basic Concepts of Human Genetics

Mendel and the Gene Idea

Dr. Mallery Biology Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS

Mendel & Inheritance. SC.912.L.16.1 Use Mendel s laws of segregation and independent assortment to analyze patterns of inheritance.

Gen e e n t e i t c c V a V ri r abi b li l ty Biolo l gy g Lec e tur u e e 9 : 9 Gen e et e ic I n I her e itan a ce

Introduction. Thomas Hunt Morgan. Chromosomes and Inheritance. Drosophila melanogaster

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

Exam 1 Answers Biology 210 Sept. 20, 2006

Topic 11. Genetics. I. Patterns of Inheritance: One Trait Considered

Basic Concepts of Human Genetics

Genetics II: Linkage and the Chromosomal Theory

Chp 10 Patterns of Inheritance

Would expect variation to disappear Variation in traits persists (Example: freckles show up in unfreckled parents offspring!)

An introductory overview of the current state of statistical genetics

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

Observing Patterns In Inherited Traits

Exploring Mendelian Genetics. Dihybrid crosses. Dihybrid crosses

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

1/21/ Exploring Mendelian Genetics. What is the principle of independent assortment? Independent Assortment. Biology.

12 The Chromosomal Basis of Inheritance

Gregor Mendel. Austrian Monk Worked with pea plants

MENDELIAN GENETICS This presentation contains copyrighted material under the educational fair use exemption to the U.S. copyright law.

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

Genetics Test. Multiple Choice Identify the choice that best completes the statement or answers the question.

The information in this document is meant to cover topic 4 and topic 10 of the IB syllabus. Details of meiosis are found in Notes for Cells.

Active Learning Exercise 8 Mendelian Genetics & the Chromosomal Basis of Inheritance

Observing Patterns in Inherited Traits. Chapter 11

ch03 Student: If a phenotype is controlled by the genotypes at two different loci the interaction of these genes is called

Chapter 14: Mendel and the Gene Idea

DO NOT OPEN UNTIL TOLD TO START

The Chromosomal Basis of Inheritance

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

Q.2: Write whether the statement is true or false. Correct the statement if it is false.

Genetics. Chapter 10/12-ish

Random Allelic Variation

Concepts: What are RFLPs and how do they act like genetic marker loci?

Inheritance Biology. Unit Map. Unit

Computational Workflows for Genome-Wide Association Study: I

-Genes on the same chromosome are called linked. Human -23 pairs of chromosomes, ~35,000 different genes expressed.

Genome-Wide Association Studies (GWAS): Computational Them

Conifer Translational Genomics Network Coordinated Agricultural Project

Non Mendelian Genetics

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Population and Community Dynamics. The Hardy-Weinberg Principle

Classical (Mendelian) Genetics. Gregor Mendel

B.6.F predict possible outcomes of various genetic combinations such as monohybrid crosses, dihybrid crosses and non Mendelian inheritance

Linkage & Crossing over

Genome-wide association studies (GWAS) Part 1

& Practice

Biology Mrs. Howe Tues, 2/7 Agenda New Seats Bioethical Decision Making Model (pg. 1-2)-> due Block 1

Biology Slide 1 of 18

Genotype AA Aa aa Total N ind We assume that the order of alleles in Aa does not play a role. The genotypic frequencies follow as

Read each question, and write your answer in the space provided. 2. How did Mendel s scientific work differ from the work of T. A. Knight?

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

wheat yield (tonnes ha 1 ) year Key: total yield contribution to yield made by selective breeding Fig. 4.1

Problem set questions from Exam 1 Unit Basic Genetic Tests, Setting up and Analyzing Crosses, and Genetic Mapping

Genetics Patterns of Inheritance. Biology 20

Genetics. Genetics- is the study of all manifestation of inheritance from the distributions of traits to the molecules of the gene itself

Genetics Sperm Meiotic cell division Egg Chromosome Segments of DNA Code DNA for traits Code for a trait Gene

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool.

Beyond Mendel s Laws of Inheritance

6.5. Traits and Probability. Punnett squares illustrate genetic crosses.

Lab Mendelian Genetics-Exploring Genetic Probability -Revisiting Mendel s Observations

Chapter 6 Linkage and Chromosome Mapping in Eukaryotes

Genetics - Problem Drill 05: Genetic Mapping: Linkage and Recombination

Chapter 23: The Evolution of Populations. 1. Populations & Gene Pools. Populations & Gene Pools 12/2/ Populations and Gene Pools

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Lecture 3 Monohybrid and Dihybrid Crosses

Population genetics. Population genetics provides a foundation for studying evolution How/Why?

HARDY-WEINBERG EQUILIBRIUM

1a. What is the ratio of feathered to unfeathered shanks in the offspring of the above cross?

four chromosomes ` four chromosomes correct markers (sister chromatids identical!)

Bio 311 Learning Objectives

Biology Genetics Practice Quiz

GENETICS. I. Review of DNA/RNA A. Basic Structure DNA 3 parts that make up a nucleotide chains wrap around each other to form a

Genetic Linkage Analysis in the Presence of Germline Mosaicism Omer Weissbrod

Hardy-Weinberg Principle

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Genetics Essentials 9/10/13. Concepts and Connections. Mendel and His Study of Heredity. The Case of the Red Hair. Before we Continue

Genetics. Biology. vocabulary terms

Yesterday s Picture UNIT 3D

Genomic Selection Using Low-Density Marker Panels

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

POPULATION GENETICS: The study of the rules governing the maintenance and transmission of genetic variation in natural populations.

Table of Contents. Chapter: Heredity. Section 1: Genetics. Section 2: Genetics Since Mendel. Section 3: Biotechnology

Conifer Translational Genomics Network Coordinated Agricultural Project

Transcription:

Linkage Analysis 02-710 Computa.onal Genomics Seyoung Kim

Genome Polymorphisms Gene.c Varia.on Phenotypic Varia.on

A Human Genealogy TCGAGGTATTAAC The ancestral chromosome

SNPs and Human Genealogy A->G G->T TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * * T->C T->C A->G

SNPs and Human Genealogy T->C A->G G->T T->C Haplotype A disease muta.on A->G TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * *

Iden6fying Disease Loci All individuals are related if we go back far enough in the ancestry Balding, Nature Reviews Gene.cs, 2006

Overview How can we iden.fy the gene.c loci responsible for determining phenotypes? Linkage analysis Data are collected for family members Difficult to collect data on a large number of families Effec.ve for rare diseases Low resolu.on on the genomes due to only few recombina.ons» a large region of linkage Genome- wide associa.on studies Data are collected for unrelated individuals Easier to find a large number of affected individuals Effec.ve for common diseases, compared to family- based method Rela.vely high resolu.on for pinpoin.ng the locus linked to the phenotype

Linkage Analysis vs. Associa6on Analysis Strachan & Read, Human Molecular Gene.cs, 2001

Mendel s two laws Modern gene.cs began with Mendel s experiments on garden peas. He studied seven contras.ng pairs of characters, including: The form of ripe seeds: round, wrinkled The color of the seed albumen: yellow, green The length of the stem: long, short Mendel s first law: Characters are controlled by pairs of genes which separate during the forma.on of the reproduc.ve cells (meiosis) A a A a

Mendel s two laws Mendel s second law: When two or more pairs of genes segregate simultaneously, they do so independently. A a; B b A B A b a B a b

Excep6ons to Mendel s Second Law Morgan s frui-ly data (1909): 2,839 flies Eye color A: red a: purple Wing length B: normal b: ves.gial AABB x aabb" AaBb x aabb" AaBb Aabb aabb aabb" Exp 710 710 710 710" Obs 1,339 151 154 1,195"

Morgan s explana6on A B A B a b a b F1: A a B b a b a b F2:" A B a b a b a b A b a b a B a b Crossover has taken place"

Recombina6on Parental types: AaBb, aabb Recombinants: Aabb, aabb The propor.on of recombinants between the two genes (or characters) is called the recombina*on frac*on between these two genes. Recombina*on frac*on It is usually denoted by r or θ. For Morgan s traits: r = (151 + 154)/2839 = 0.107 If r < 1/2: two genes are said to be linked. If r = 1/2: independent segrega.on (Mendel s second law). Now we move on to (small) pedigrees.

Linkage Analysis Goal: Iden.fy the unknown disease locus Idea: Given pedigree data and a map of gene.c markers, let s look for the markers that are linked to the unknown disease locus (i.e. linkage between the disease locus and the marker locus) Disease Locus Marker near the disease locus (r<<0.5) Markers far from the disease locus (r=0.5)

Linkage Disequilibrium in Gene Mapping LD is the non- random associa.on of alleles at different loci Gene.c recombina.on breaks down LD

Linkage Analysis Parametric Linkage Analysis Need to specify the disease model Compute LOD- score based on the model for each marker Markers with the high LOD- scores are considered as linked to disease locus Highly effec.ve for Mendelian disease caused by a single locus Usually based on a large pedigree

Probabilis6c Graphical Models p(x 3 X 2 ) X 3 p(x 2 X 1 ) X 2 p(x 1 ) X 1 X 6 p(x 4 X 1 ) p(x 5 X 4 ) X 4 X 5 p(x 6 X 2, X 5 ) The joint distribu.on on (X 1, X 2,, X N ) factors according to the parent- of rela.ons defined by the edges E : p(x 1, X 2, X 3, X 4, X 5, X 6 ) = p(x 1 ) p(x 2 X 1 )p(x 3 X 2 ) p(x 4 X 1 )p(x 5 X 4 )p(x 6 X 2, X 5 )

Pedigree as Graphical Models: the Allele Network Shaded means affected, blank means unaffected. Phenotype Ag A0 A1 Genotypes B0 B1 Bg Grandpa Grandma Fg F0 F1 M0 Mg M1 Father Mother Cg C0 Child C1

Founders and Non- founders Founders: individuals whose parents are not in the pedigree. Non- founders: individuals whose parents are not in the pedigree. Grandpa Grandma Father Mother Child

Probability Models over Pedigree Founder genotype probabili6es Phenotype Ag A0 A1 Genotypes B0 B1 Bg Transmission probabili6es: P(child s genotype father s genotype, mother s genotype) Fg F0 F1 M0 Mg Penetrance model: P(phenotype genotype) for each individual Cg C0 M1 C1

Probability Models over Pedigree Genotype probabili.es are independent across different founders Across siblings of the same parents Phenotype Ag A0 A1 Genotypes B0 B1 Bg Phenotype probability of each individual is independent of all other individuals genotypes, condi.onal on their own genotype Fg F0 F1 M0 M1 Mg Cg C0 C1

One Locus: Founder Genotype Probabili6es Assign founder probabili*es to their genotypes, assuming Hardy- Weinberg equilibrium Example: If the frequency of D is.01, HWE says 1 P(Dd) = 2x.01x.99 2 dd P(dd) = (.99) 2

One Locus: Transmission Probabili6es Children get their genes from their parents genes, independently, according to Mendel s laws; 1 2 d d P(G ch3 = dd G pop1 = Dd, G mom2 = Dd ) " " " = 1/2 x 1/2" The inheritances are independent for different children. 3

One Locus: Transmission Probabili6es Children get their genes from their parents genes, independently, according to Mendel s laws; 1 2 1 2 3 P(G ch3 = Dd G pop1 = Dd, G mom2 = Dd ) " " " = (1/2 x 1/2)x2" The factor 2 comes from summing over the two mutually exclusive and equiprobable ways Child3 can get a D and a d. 3 D D P(G ch3 = DD G pop1 = Dd, G mom2 = Dd ) " " " = 1/2 x 1/2"

One Locus: Penetrance Probabili6es Complete penetrance: P(Ph = affected G=DD ) = 1 DD Incomplete penetrance: P(Ph = affected G=DD ) =.8 DD Independent Penetrance Model: Pedigree analyses usually suppose that, given the genotype at all loci, and in some cases age and sex, the chance of having a par.cular phenotype depends only on genotype at one locus, and is independent of all other factors: genotypes at other loci, environment, genotypes and phenotypes of rela.ves, etc.

One Locus: Penetrance Probabili6es Age and sex- dependent penetrance: D D (45 years old) P( Ph = affected G = DD, sex = male, age = 45 y.o. ) =.6

One Locus: PuWng it All Together The overall pedigree likelihood is given as L = P(G f ) P(G ch G pop,g mom ) P(Ph i G i ) f Founders ch Nonfounders i {Founders,Nonfounders} If founder or non- founder genotypes are unavailable/missing, we sum over all possible genotypes for those individuals with missing genotypes to obtain the likelihood L = P(G f ) P(G ch G pop,g mom ) G f,g ch f Founders ch Nonfounders P(Ph i G i ) i {Founders,Nonfounders}

One Locus: LOD Score Null hypothesis: the disease locus is unlinked to the given marker locus being tested Alterna.ve hypothesis: the disease locus is linked to the given marker locus being tested LOD Score = Log 10 (Likelihood under the alterna.ve hypothesis) Log 10 (Likelihood under the null hypothesis) Likelihood under the null hypothesis can be obtained by summing the pedigree likelihood over all possible genotypes of the all pedigree individuals: Computa.onally expensive but efficient algorithms exist

One Locus Example 1 2 3 4 5 d d D D Assume Penetrances: P(affected dd ) =.1, P(affected Dd ) =.3, P(affected DD ) =.8. Allele D has frequency.01. The probability of this pedigree is given as (2 x.01 x.99 x.7) x (2 x.01 x.99 x.3) x (1/2 x 1/2 x.9) x (2 x 1/2 x 1/2 x.7) x (1/2 x 1/2 x.8)

One Locus Analysis Two algorithms: The general strategy of beginning with founders, then non- founders, and mul.plying and summing as appropriate, has been codified in what is known as the Elston- Stewart algorithm for calcula.ng probabili.es over pedigrees. It is one of the two widely used approaches. The other is called the Lander- Green algorithm and takes a quite different approach. Lander- Green algorithm uses hidden Markov models to model mul.ple loci

Two Loci: Linkage and Recombina6on D D T T T t 1 2 d d t t Son 3 produces sperms with D- T, D- t, d- T or d- t in propor.ons: 3" no recomb.

Two Loci: Linkage and Recombina6on Son produces sperm with DT, Dt, dt or dt in propor.ons: θ = 1/2 : independent assortment (cf Mendel) unlinked loci" θ < 1/2 : linked loci " θ 0" : tightly linked loci " Note: θ > 1/2 is never observed!

Two Loci: Linkage and Recombina6on Son produces sperm with DT, Dt, dt or dt in propor.ons: If the loci are linked,! D-T and d-t are parental haplotypes! D-t and d-t are recombinant haplotypes

Phase Phase is known for an individual if you can tell whether the gamete was parental or recombinant Phase is unknown if you cannot tell whether the gamete was parental or recombinant

Two Loci: Phase Known Pedigree D D d d T T t t d d T t t t d d T t T t t t t t Recombination only discernible in the father. Here ˆ" θ = 1/4 (why?)" This is called the phase-known double backcross pedigree. " What if the grandparents genotypes are not known?

Two Loci: Phase Unknown Pedigree Suppose the grandparents genotypes are unavailable in the double backcross pedigree t T? T t d d t t T t D- T from father: parental or recombinant? If father got D- T from one parent and d- t from the other, the daughter's paternally derived haplotype is parental. If father got D- t from one parent and d- T from the other, daughter's paternally derived haplotype would be recombinant.

Two Loci: Dealing with Phase Phase is usually regarded as unknown gene.c informa.on Some.mes, but not always, phase can be inferred with certainty from genotype data on parents, mul.ple children, rela.ves. In prac.ce, probabili.es must be calculated under all phases compa.ble with the observed data, and added together: computa.onally intensive, especially with mul.locus analyses.

Two Loci: Founder Probabili6es Assume linkage equilibrium, i.e. independence of genotypes across the two loci. Allele frequencies at locus one: D =.01, and d =.99 Allele frequencies at locus two: T =.25 and t =.75 Haplotype frequencies DT =.01 x.25 Dt =.01 x.75 dt =.99 x.25 dt =.99 x.75 Together with Hardy- Weinberg, this implies that P(G = DdTt ) = (2 x.01 x.99) x (2 x.25 x.75) Dd = 2 x (.01 x.25) x (.99 x.75) + 2 x (.01 x.75) x (.99 x.25). adds haplotype pair probabili.es. Tt

Two Loci: Transmission Probabili6es For a given haplotype inheritance: T t T t d d t t Here only the father can exhibit recombina.on: mother is uninforma6ve. P(G ch = DT/dt G pop = DT/dt, G mom = dt/dt ) = P(G ch = DT G pop = DT/dt ) x P(G ch = dt G mom = dt/dt ) = (1- θ)/2 x 1. Sum the probabili.es over all possible phases/haplotypes.

Two Loci: Penetrance In all standard linkage programs, different parts of phenotype are condi.onally independent given all genotypes, and two- loci penetrances split into products of one- locus penetrances. Assuming the penetrances for DD, Dd and dd given earlier, and that T,t are two alleles at a co- dominant marker locus. P( Ph 1 = affected, Ph 2 = Tt G 1 = DD, G 2 = Tt ) = Pr( Ph 1 = affected G 1 = DD, G 2 = Tt ) Pr(Ph 2 = Tt G 1 = DD, G 2 = Tt ) = 0.8 1

Two Loci: Phase Unknown Double Backcross We assume below pop is as likely to be DT / dt as Dt / dt." d d T t t t T t T t d d t t t t " P(all data θ ) " = "P(parents' data θ ) P(kids' data parents' data, θ)" = "P(parents' data) {[((1-θ)/2) 3 θ/2]/2+ [(θ/2) 3 (1-θ)/2]/2}" This is then maximised in θ, in this case numerically. Here θ ˆ" = 0.25"

Log (base 10) Odds or LOD Scores Suppose P(data θ) is the likelihood function of a recombination fraction θ generated by some 'data', and P(data 1/2) is the same likelihood when θ= 1/2." This can equally well be done with Log 10 L, i.e." LOD(θ * ) = Log 10 P(data θ*) Log 10 P(data ½)" " measures the relative strength of the data for θ = θ* (optimal θ) rather than θ = 1/2.!

Facts about/interpreta6on of LOD scores 1. Positive LOD scores suggests stronger support for θ* than for 1/2, negative LOD scores the reverse." 2. Higher LOD scores means stronger support, lower means the reverse." 3. LODs are additive across independent pedigrees, and under certain circumstances can be calculated sequentially." 4. For a single two-point linkage analysis, the threshold LOD 3 has become the de facto standard for "establishing linkage", i.e. rejecting the null hypothesis of no linkage." 5. When more than one locus or model is examined, the remark in 4 must be modified, sometimes dramatically."

Assump6ons underpinning most 2- point human linkage analyses Founder Frequencies: Hardy-Weinberg, random mating at each locus. Linkage equilibrium across loci, known allele frequencies; founders independent." Transmission: Mendelian segregation, no mutation." Penetrance: single locus, no room for dependence on relatives' phenotypes or environment. Known (including phenocopy rate)." Implicit: phenotype and genotype data correct, marker order and location correct" Comment: Some analyses are robust, others can be very sensitive to violations of some of these assumptions. "