Cornell Probability Summer School 2006

Similar documents
On polyclonality of intestinal tumors

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016

Characterization of Allele-Specific Copy Number in Tumor Genomes

GATCGTGCACGATCTCGGCAATTCGGGATGCCGGCTCGTCACCGGTCGCT

12/8/09 Comp 590/Comp Fall

APPLICATION NOTE

I See Dead People: Gene Mapping Via Ancestral Inference

Identifying CpG islands using hidden Markov models

Identifying CpG islands using hidden Markov models

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Lecture 9. Eukaryotic gene regulation: DNA METHYLATION

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Undergraduate research and graduate school so far

Quantitative analysis of methylation at multiple CpG sites by Pyrosequencing TM

Genetic Basis of Development & Biotechnologies

Introduction to Quantitative Genomics / Genetics

Measurement of Molecular Genetic Variation. Forces Creating Genetic Variation. Mutation: Nucleotide Substitutions

Chapter 15 Gene Technologies and Human Applications

Why do we need statistics to study genetics and evolution?

Population Genetics II. Bio

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Recombination, and haplotype structure

DNA METHYLATION RESEARCH TOOLS

Analysis of gene function

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

Model based inference of mutation rates and selection strengths in humans and influenza. Daniel Wegmann University of Fribourg

HST.161 Molecular Biology and Genetics in Modern Medicine Fall 2007

Pyrosequencing for quantitative analysis of methylation at multiple CpG sites

Summary. Introduction

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

Experimental validation of candidates of tissuespecific and CpG-island-mediated alternative polyadenylation in mouse

Nature Genetics: doi: /ng.3254

Exome Sequencing Exome sequencing is a technique that is used to examine all of the protein-coding regions of the genome.

Computational Models for Cell Reprogramming

Personal and population genomics of human regulatory variation

c) Assuming he does not run another endurance race, will the steady-state populations be affected one year later? If so, explain how.

Population Genetics (Learning Objectives)

SLiM: Simulating Evolution with Selection and Linkage

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

RecQ Helicases and GI Cancers. Mark Derleth, R3 Supervisor: Bill Grady

Nature Methods: doi: /nmeth Supplementary Figure 1

Overview of Human Genetics

Supplementary Tables. Primers and probes. Target Primer / probe sequences Chemistry Human HBB F: AACTGTGTTCACTAGCAACCTCAAA

Great Ideas of Biology

7.03 Final Exam. TA: Alex Bagley Alice Chi Dave Harris Max Juchheim Doug Mills Rishi Puram Bethany Redding Nate Young

Exam 3 4/25/07. Total of 7 questions, 100 points.

Stem Cells & Neurological Disorders. Said Ismail Faculty of Medicine University of Jordan

Bio 311 Learning Objectives

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Simple haplotype analyses in R

Supplementary Figure 1 An overview of pirna biogenesis during fetal mouse reprogramming. (a) (b)

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

NimbleGen Arrays and LightCycler 480 System: A Complete Workflow for DNA Methylation Biomarker Discovery and Validation.

Chapter 14: Genes in Action

Concepts: What are RFLPs and how do they act like genetic marker loci?

Complete Sample to Analysis Solutions for DNA Methylation Discovery using Next Generation Sequencing

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Physical Anthropology 1 Milner-Rose

Molecular Evolution. COMP Fall 2010 Luay Nakhleh, Rice University

QTL Mapping, MAS, and Genomic Selection

CSE /CSE6602E - Soft Computing Winter Lecture 9. Genetic Algorithms & Evolution Strategies. Guest lecturer: Xiangdong An

Supplementary information ATLAS

Genomic models in bayz

Mutations during meiosis and germ line division lead to genetic variation between individuals

4.1. Genetics as a Tool in Anthropology

Additional Practice Problems for Reading Period

Introducing new DNA into the genome requires cloning the donor sequence, delivery of the cloned DNA into the cell, and integration into the genome.

Stem cells in Development

Additional levels of regulation

Genetics Lecture Notes Lectures 6 9

Stem cells in Development

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

What is Epigenetics? Watch the video

Recombinant DNA recombinant DNA DNA cloning gene cloning

SUPPLEMENTARY INFORMATION

UNIT MOLECULAR GENETICS AND BIOTECHNOLOGY

Lesson 7A Specialized Cells, Stem Cells & Cellular Differentiation

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Genome annotation & EST

BIOTECHNOLOGY. Unit 8

MARKOV CHAIN MONTE CARLO SAMPLING OF GENE GENEALOGIES CONDITIONAL ON OBSERVED GENETIC DATA

Algorithms for Genetics: Introduction, and sources of variation

Monitoring genetic change in wild populations of fish &wildlife

CSE 427. Markov Models and Hidden Markov Models

Admission Exam for the Graduate Course in Bioinformatics. November 17 th, 2017 NAME:

Supplementary Methods

Neutrality Test. Neutrality tests allow us to: Challenges in neutrality tests. differences. data. - Identify causes of species-specific phenotype

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Happy Monday! Have out: 15.1 Notes (due today) Pen or pencil. Upcoming: 15.1 Quiz on block day 15.2 Notes due Friday (2/1)

Introduction to human genomics and genome informatics

Molecular Genetics FINAL page 1 of 7 Thursday, Dec. 14, 2006 Your name:

Genetic Technologies.notebook March 05, Genetic Technologies

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

POPULATION GENETICS. Evolution Lectures 4

DNA METHYLATION NH 2 H 3. Solutions using bisulfite conversion and immunocapture Ideal for NGS, Sanger sequencing, Pyrosequencing, and qpcr

Inference of the Properties of the Recombination Process from Whole Bacterial Genomes

Introduction to Population Genetics. Spezielle Statistik in der Biomedizin WS 2014/15

Genetics - Problem Drill 19: Dissection of Gene Function: Mutational Analysis of Model Organisms

Gene Expression. Chapters 11 & 12: Gene Conrtrol and DNA Technology. Cloning. Honors Biology Fig

Transcription:

Colon Cancer Cornell Probability Summer School 2006 Simon Tavaré Lecture 5 Stem Cells Potten and Loeffler (1990): A small population of relatively undifferentiated, proliferative cells that maintain their population size when they divide, while at the same time producing progeny that enter a dividing transit population within which further rounds of division occur together with differentiation events which resulted ultimately in the production of functional cell types required of the tissue Embryonic stem cells (ES) enormous division potential can produce all differentiated cell types needed by organism Adult stem cells restricted range of differentiated products (?) Colon crypts Stem cells are undifferentiated cells residing in a specific location (niche) in a tissue can produce a variety of somatic cell types needed for tissue renewal produce intermediate (Transit Amplifying) cells that can divide rapidly and differentiate into various types of tissue cell must be maintained, as only they can effect continuous renewal Colon lined with 15 million crypts Problem: no way to identify stem cells

Colon crypts How do stem cells maintain their numbers? Model A (Deterministic) Small number of stem cells in niche each generates a single stem cell and a single TA cell on division (asymmetric division) each stem cell is immortal How do stem cells maintain their numbers? Model A (Deterministic) Small number of stem cells in niche each generates a single stem cell and a single TA cell on division each stem cell is immortal How do we distinguish between the two? We need a marker that changes rapidly during cell division Mutations in DNA? Model B (Stochastic) Many stem cells in a niche each stem cell produces 0,1 or 2 stem cells (and 2, 1 or 0 TA cells) on division How do we distinguish between the two? We need a marker that changes rapidly during cell division Mutations in DNA? We use CpG methylation patterns... epigenetic changes that survive mitotic division Methylation CpG islands In human genome, CpG dinucleotides are relatively rare CpG pairs undergo a process called methylation that modifies the C nucleotide A methylated C can (with relatively high probability) mutate to a T Promoter regions are CpG rich These regions are not methylated, and thus mutate less often

Fragile X Syndrome Methylation... Causes repression of gene expression CpG islands often located around promoters of housekeeping genes these are not usually methylated Inactive genes often methylated Methylation patterns Methylation patterns vary with time Can be detected by bisulfite sequencing Bisulfite treatment changes unmethylated C into U, but leaves methylated C alone. Sequencing identifies methylated sites as C, unmethylated as T. Data We studied methylation patterns in three genes not expressed in colon crypt cells MYOD1(5),CSX(8) BGN(9) X-chromosome locus, 130 bp island 7 male patients 7 9normalcryptsperperson 8 24 molecules studied per crypt Experimental Method Methylation in a Single Individual

A stochastic model Start with N stem cells in crypt Assume constant number N of stem cells after every replication Each cell that is not a stem cell is a TA cell TA cells initiate independent branching processes that grow for a fixed number of replications and then die out The branching mechanism reflects the fact that a crypt contains about 2000 cells A Cannings model Start with N stem cells in crypt X 1,...,X N iid with IP(X i =1)=p, IP(X i =0)=IP(X i =2) Assume a constant number of stem cells after every replication The joint distribution of the numbers ν 1,...,ν N of stem cells copied from stem cells 1,..., N is given by L(ν 1,...,ν N )=L(X 1,...,X N ΣX i = N) Describing methylation patterns After g generations crypt contains a number of cells from which we sample a few for bisulfite treatment, PCR amplification, cloning and sequencing Superimpose effect of changes in methylation during mitotic division Aim: infer something about the number of stem cells, given the observed methylation patterns A number of summary statistics: percent methylation number of unique tags ( alleles ) pairwise difference statistics number of segregating sites Which Model? Reminder: ABC Table 2. Observed and expected variance of unique tags per crypt Stem cell model Observed variance 2 immortal, p 1.0 64-cell niche, p 0.95 Variance under model, average (CI) 256-cell niche, p 0.89 CSX A 2.5 0.37 (0.14 0.90) 0.83 (0.24 2.1) 1.1 (0.24 2.7) B 2.3 0.47 (0.14 1.1) 0.84 (0.24 2.3) 1.0 (0.24 2.3) C 1.8 0.35 (0.11 0.78) 1.0 (0.28 2.3) 1.3 (0.36 2.9) D 1.9 0.50 (0.12 1.1) 0.86 (0.21 2.1) 1.1 (0.27 2.8) E 2.0 0.44 (0.14 0.90) 0.83 (0.14 2.2) 1.1 (0.24 2.8) F 0.94 0.39 (0.19 0.78) 0.94 (0.25 2.2) 1.2 (0.28 2.4) G 2.5 0.42 (0.14 1.0) 0.83 (0.14 2.1) 0.98 (0.24 2.3) H 1.8 0.45 (0.14 1.0) 0.81 (0.24 2.2) 1.0 (0.24 2.5) I 1.3 0.46 (0.14 1.0) 0.99 (0.24 2.2) 1.2 (0.29 3.0) BGN D 0.67 0.073 (0 0.33) 0.63 (0.24 1.5) 0.79 (0.24 1.9) F 2.3 0.035 (0 0.21) 0.80 (0.21 2.3) 1.0 (0.27 2.6) H 4.3 0.029 (0 0.14) 0.75 (0.14 1.9) 0.90 (0.14 2.2) I 1.1 0.052 (0 0.24) 0.68 (0.14 1.7) 0.84 (0.14 2.0) M 1.4 0.031 (0 0.21) 0.67 (0.21 1.6) 0.90 (0.21 2.0) Can simulate (forwards) from this model easily, so... Simulate θ from prior π Simulate data D sim from model with parameter θ Accept θ if d(d sim, D) is small Start over The art is in choosing summary statistics

ABC Approach Back to crypts... Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Summary statistics: number of unique tags, percent methylation, mean distance, number of segregating sites Back to crypts... Back to crypts... Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Summary statistics: number of unique tags, percent methylation, mean distance, number of segregating sites What is close? Small relative error: d = s i,sim s i,obs s i,obs +1 Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Summary statistics: number of unique tags, percent methylation, mean distance, number of segregating sites What is close? Small relative error d = s i,sim s i,obs s i,obs +1 Small run 755 points Posterior for N Posterior for P

Pierre Nicolas, INRA A Continuous-time Model Say a stem cell dies if it is replaced by two TA cells Life span of a stem cell is Exponential, mean 1/γ When cell dies, another stem cell having two stem cell offspring is copied to replace it The genealogy of stem cells looks like a coalescent Crypt contains N equal-sized subpopulations, each the progeny of a single stem cell A pair of stem cells coalesces at rate 2γ N 1 Modelling Methylation Patterns All islands unmethylated at birth of individual Independent sites model µ =(µ m,µ u ) methylation rates Context-dependent model methylation/demethylation events occur at rate that depends on number of methylated sites ɛ sequencing error rate per site per molecule Genealogy of TA cells Descendants of a Stem Cell TA cells have small, fixed number of divisions (g) Time scale of process expressed in arbitrary units g=5 Rate η of methylation process relative to time scale of TA part stem cell stage 1 stage 2 stage 3 stage 4 cells from a same stem cell progeny stage 5 dead cells removed from the crypt Take η µ Genealogy modeled as a coalescent with expansion Star-like Genealogy of Sample Parameterization and Priors N uniform g=5 λ = γ N 1 λ 1 uniform ν = µ/λ each component is log-normal(0,σ) σ exponential, mean 1 η = αν α exponential, mean 1 g uniform ɛ U(0,1)

MCMC algorithm Find posterior of θ =(N,λ,g,σ,ν,α,ɛ) given methylation patterns X And then a miracle occurs! MCMC Non-stationary coalescent Augmented state space: (θ, Λ,Y ) Λ is collection of genealogies of methylation patterns Y denotes methylation patterns in nodes of Λ Updating N is hard embed model in one where N is allowed to vary by ±1 between crypts Simulated Dataset I Moves around Λ via Wilson/Balding (avoids peeling) the values in Y are used to propose changes Many different types of updates are combined in this approach Run for 5,000,000 iterations, and record (θ, Λ,Y ) every 100 steps after first 500,000 No apparent convergence problems PCR: 10 days per run density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 N=6 25 30 N Simulated Dataset II Predictive assessment of model fitness density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 N=24 Five within-crypt statistics: number of distinct patterns number of polymorphic sites average distance between patterns number of unmethylated patterns number of singletons 25 30 N Compare distribution of inter-crypt average and standard deviation with actual values

# distinct patterns, polymorphic sites Intercrypt average Intercrypt sd Independent methylation process ave dist, # unmethylated, # singletons 3.0 3.0 Dependent methylation process # distinct patterns, polymorphic sites ave dist, # unmethylated, # singletons Intercrypt average Intercrypt sd 3.0 3.0

Patient X Robustness # distinct patterns, polymorphic sites ave dist, # unmethylated, # singletons Intercrypt average Intercrypt sd 3.0 3.0 Posteriors: shape of genealogy Posteriors Density 0.00 0.02 0.04 0.06 25 30 N Density 0.00 0.02 0.04 0 20 40 60 80 100 1 Density 0.00 0.10 0.20 5 6 7 8 9 10 g

Posteriors: polymorphism, given genealogy Current work µ 2.0 Nb of methylated sites Density 0 10 30 50 0.00 0.02 0.04 0.06 0.08 0.10 Density 0 100 300 0.000 0.004 0.008 Generating much bigger data sets more CpG islands, different lengths experimental issues e.g. PCR errors Getting data from other tissue types have endometrium, small intestine, hair doing blood, brain, heart Develop better markers? Model spatial structure in crypts Inference about crypts: ABC approach References Nicolas P, Shibata D & Tavaré S. Posterior inference on the stem cell population of the human colon crypt through analysis of methylation patterns. In preparation. Shibata D & Tavaré S. Counting divisions in a human somatic cell tree: how, what and why. Cell Cycle, 5, 610 614, 2006. Kim JY, Tavaré S & Shibata D. Counting human somatic cell replications: Methylation mirrors human endometrial stem cell divisions. Proc Natl Acad Sci USA, 102, 17739 17744, 2005. Kim JY, Siegmund KD, Tavaré S&ShibataD. Age-related human small intestine methylation: evidence for stem cell niches. BMC Medicine. 3:10, 2005. Calabrese P, Mecklin JP, Järvinen HJ, Aaltonen LA, Tavaré S & Shibata D. Numbers of mutations to different types of colorectal cancer. BMC Cancer, 5:126, 2005. Calabrese P, Tavaré S & Shibata D. Pre-tumor progression: clonal evolution of human stem cell populations. Am. J. Pathol., 164, 1337 1346, 2004.