Colon Cancer Cornell Probability Summer School 2006 Simon Tavaré Lecture 5 Stem Cells Potten and Loeffler (1990): A small population of relatively undifferentiated, proliferative cells that maintain their population size when they divide, while at the same time producing progeny that enter a dividing transit population within which further rounds of division occur together with differentiation events which resulted ultimately in the production of functional cell types required of the tissue Embryonic stem cells (ES) enormous division potential can produce all differentiated cell types needed by organism Adult stem cells restricted range of differentiated products (?) Colon crypts Stem cells are undifferentiated cells residing in a specific location (niche) in a tissue can produce a variety of somatic cell types needed for tissue renewal produce intermediate (Transit Amplifying) cells that can divide rapidly and differentiate into various types of tissue cell must be maintained, as only they can effect continuous renewal Colon lined with 15 million crypts Problem: no way to identify stem cells
Colon crypts How do stem cells maintain their numbers? Model A (Deterministic) Small number of stem cells in niche each generates a single stem cell and a single TA cell on division (asymmetric division) each stem cell is immortal How do stem cells maintain their numbers? Model A (Deterministic) Small number of stem cells in niche each generates a single stem cell and a single TA cell on division each stem cell is immortal How do we distinguish between the two? We need a marker that changes rapidly during cell division Mutations in DNA? Model B (Stochastic) Many stem cells in a niche each stem cell produces 0,1 or 2 stem cells (and 2, 1 or 0 TA cells) on division How do we distinguish between the two? We need a marker that changes rapidly during cell division Mutations in DNA? We use CpG methylation patterns... epigenetic changes that survive mitotic division Methylation CpG islands In human genome, CpG dinucleotides are relatively rare CpG pairs undergo a process called methylation that modifies the C nucleotide A methylated C can (with relatively high probability) mutate to a T Promoter regions are CpG rich These regions are not methylated, and thus mutate less often
Fragile X Syndrome Methylation... Causes repression of gene expression CpG islands often located around promoters of housekeeping genes these are not usually methylated Inactive genes often methylated Methylation patterns Methylation patterns vary with time Can be detected by bisulfite sequencing Bisulfite treatment changes unmethylated C into U, but leaves methylated C alone. Sequencing identifies methylated sites as C, unmethylated as T. Data We studied methylation patterns in three genes not expressed in colon crypt cells MYOD1(5),CSX(8) BGN(9) X-chromosome locus, 130 bp island 7 male patients 7 9normalcryptsperperson 8 24 molecules studied per crypt Experimental Method Methylation in a Single Individual
A stochastic model Start with N stem cells in crypt Assume constant number N of stem cells after every replication Each cell that is not a stem cell is a TA cell TA cells initiate independent branching processes that grow for a fixed number of replications and then die out The branching mechanism reflects the fact that a crypt contains about 2000 cells A Cannings model Start with N stem cells in crypt X 1,...,X N iid with IP(X i =1)=p, IP(X i =0)=IP(X i =2) Assume a constant number of stem cells after every replication The joint distribution of the numbers ν 1,...,ν N of stem cells copied from stem cells 1,..., N is given by L(ν 1,...,ν N )=L(X 1,...,X N ΣX i = N) Describing methylation patterns After g generations crypt contains a number of cells from which we sample a few for bisulfite treatment, PCR amplification, cloning and sequencing Superimpose effect of changes in methylation during mitotic division Aim: infer something about the number of stem cells, given the observed methylation patterns A number of summary statistics: percent methylation number of unique tags ( alleles ) pairwise difference statistics number of segregating sites Which Model? Reminder: ABC Table 2. Observed and expected variance of unique tags per crypt Stem cell model Observed variance 2 immortal, p 1.0 64-cell niche, p 0.95 Variance under model, average (CI) 256-cell niche, p 0.89 CSX A 2.5 0.37 (0.14 0.90) 0.83 (0.24 2.1) 1.1 (0.24 2.7) B 2.3 0.47 (0.14 1.1) 0.84 (0.24 2.3) 1.0 (0.24 2.3) C 1.8 0.35 (0.11 0.78) 1.0 (0.28 2.3) 1.3 (0.36 2.9) D 1.9 0.50 (0.12 1.1) 0.86 (0.21 2.1) 1.1 (0.27 2.8) E 2.0 0.44 (0.14 0.90) 0.83 (0.14 2.2) 1.1 (0.24 2.8) F 0.94 0.39 (0.19 0.78) 0.94 (0.25 2.2) 1.2 (0.28 2.4) G 2.5 0.42 (0.14 1.0) 0.83 (0.14 2.1) 0.98 (0.24 2.3) H 1.8 0.45 (0.14 1.0) 0.81 (0.24 2.2) 1.0 (0.24 2.5) I 1.3 0.46 (0.14 1.0) 0.99 (0.24 2.2) 1.2 (0.29 3.0) BGN D 0.67 0.073 (0 0.33) 0.63 (0.24 1.5) 0.79 (0.24 1.9) F 2.3 0.035 (0 0.21) 0.80 (0.21 2.3) 1.0 (0.27 2.6) H 4.3 0.029 (0 0.14) 0.75 (0.14 1.9) 0.90 (0.14 2.2) I 1.1 0.052 (0 0.24) 0.68 (0.14 1.7) 0.84 (0.14 2.0) M 1.4 0.031 (0 0.21) 0.67 (0.21 1.6) 0.90 (0.21 2.0) Can simulate (forwards) from this model easily, so... Simulate θ from prior π Simulate data D sim from model with parameter θ Accept θ if d(d sim, D) is small Start over The art is in choosing summary statistics
ABC Approach Back to crypts... Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Summary statistics: number of unique tags, percent methylation, mean distance, number of segregating sites Back to crypts... Back to crypts... Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Summary statistics: number of unique tags, percent methylation, mean distance, number of segregating sites What is close? Small relative error: d = s i,sim s i,obs s i,obs +1 Priors: N U(1, 100),P U(0.9, 1.0),µ U(10 5, 5 10 5 ) Summary statistics: number of unique tags, percent methylation, mean distance, number of segregating sites What is close? Small relative error d = s i,sim s i,obs s i,obs +1 Small run 755 points Posterior for N Posterior for P
Pierre Nicolas, INRA A Continuous-time Model Say a stem cell dies if it is replaced by two TA cells Life span of a stem cell is Exponential, mean 1/γ When cell dies, another stem cell having two stem cell offspring is copied to replace it The genealogy of stem cells looks like a coalescent Crypt contains N equal-sized subpopulations, each the progeny of a single stem cell A pair of stem cells coalesces at rate 2γ N 1 Modelling Methylation Patterns All islands unmethylated at birth of individual Independent sites model µ =(µ m,µ u ) methylation rates Context-dependent model methylation/demethylation events occur at rate that depends on number of methylated sites ɛ sequencing error rate per site per molecule Genealogy of TA cells Descendants of a Stem Cell TA cells have small, fixed number of divisions (g) Time scale of process expressed in arbitrary units g=5 Rate η of methylation process relative to time scale of TA part stem cell stage 1 stage 2 stage 3 stage 4 cells from a same stem cell progeny stage 5 dead cells removed from the crypt Take η µ Genealogy modeled as a coalescent with expansion Star-like Genealogy of Sample Parameterization and Priors N uniform g=5 λ = γ N 1 λ 1 uniform ν = µ/λ each component is log-normal(0,σ) σ exponential, mean 1 η = αν α exponential, mean 1 g uniform ɛ U(0,1)
MCMC algorithm Find posterior of θ =(N,λ,g,σ,ν,α,ɛ) given methylation patterns X And then a miracle occurs! MCMC Non-stationary coalescent Augmented state space: (θ, Λ,Y ) Λ is collection of genealogies of methylation patterns Y denotes methylation patterns in nodes of Λ Updating N is hard embed model in one where N is allowed to vary by ±1 between crypts Simulated Dataset I Moves around Λ via Wilson/Balding (avoids peeling) the values in Y are used to propose changes Many different types of updates are combined in this approach Run for 5,000,000 iterations, and record (θ, Λ,Y ) every 100 steps after first 500,000 No apparent convergence problems PCR: 10 days per run density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 N=6 25 30 N Simulated Dataset II Predictive assessment of model fitness density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 N=24 Five within-crypt statistics: number of distinct patterns number of polymorphic sites average distance between patterns number of unmethylated patterns number of singletons 25 30 N Compare distribution of inter-crypt average and standard deviation with actual values
# distinct patterns, polymorphic sites Intercrypt average Intercrypt sd Independent methylation process ave dist, # unmethylated, # singletons 3.0 3.0 Dependent methylation process # distinct patterns, polymorphic sites ave dist, # unmethylated, # singletons Intercrypt average Intercrypt sd 3.0 3.0
Patient X Robustness # distinct patterns, polymorphic sites ave dist, # unmethylated, # singletons Intercrypt average Intercrypt sd 3.0 3.0 Posteriors: shape of genealogy Posteriors Density 0.00 0.02 0.04 0.06 25 30 N Density 0.00 0.02 0.04 0 20 40 60 80 100 1 Density 0.00 0.10 0.20 5 6 7 8 9 10 g
Posteriors: polymorphism, given genealogy Current work µ 2.0 Nb of methylated sites Density 0 10 30 50 0.00 0.02 0.04 0.06 0.08 0.10 Density 0 100 300 0.000 0.004 0.008 Generating much bigger data sets more CpG islands, different lengths experimental issues e.g. PCR errors Getting data from other tissue types have endometrium, small intestine, hair doing blood, brain, heart Develop better markers? Model spatial structure in crypts Inference about crypts: ABC approach References Nicolas P, Shibata D & Tavaré S. Posterior inference on the stem cell population of the human colon crypt through analysis of methylation patterns. In preparation. Shibata D & Tavaré S. Counting divisions in a human somatic cell tree: how, what and why. Cell Cycle, 5, 610 614, 2006. Kim JY, Tavaré S & Shibata D. Counting human somatic cell replications: Methylation mirrors human endometrial stem cell divisions. Proc Natl Acad Sci USA, 102, 17739 17744, 2005. Kim JY, Siegmund KD, Tavaré S&ShibataD. Age-related human small intestine methylation: evidence for stem cell niches. BMC Medicine. 3:10, 2005. Calabrese P, Mecklin JP, Järvinen HJ, Aaltonen LA, Tavaré S & Shibata D. Numbers of mutations to different types of colorectal cancer. BMC Cancer, 5:126, 2005. Calabrese P, Tavaré S & Shibata D. Pre-tumor progression: clonal evolution of human stem cell populations. Am. J. Pathol., 164, 1337 1346, 2004.