On polyclonality of intestinal tumors

Similar documents
On estimating the polyclonal fraction in lineage-marker studies of tumor origin

On estimating the polyclonal fraction in. lineage-marker studies of tumor origin

Cornell Probability Summer School 2006

Introduction to Quantitative Genomics / Genetics

Characterization of Allele-Specific Copy Number in Tumor Genomes

Bayesian Variable Selection and Data Integration for Biological Regulatory Networks

I See Dead People: Gene Mapping Via Ancestral Inference

Haplotype Based Association Tests. Biostatistics 666 Lecture 10

Inference and computing with decomposable graphs

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

computational analysis of cell-to-cell heterogeneity in single-cell rna-sequencing data reveals hidden subpopulations of cells

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

LadyBug a software environment for stochastic epidemic models

Modeling the Safety Effect of Access and Signal Density on. Suburban Arterials: Using Macro Level Analysis Method

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

SNPs and Hypothesis Testing

Molecular Biology of the Cell ed. By Bruce Albert et al. (free online through ncbi books)

Identification of the essential genes in the M. tuberculosis genome by random transposon mutagenesis

ReCombinatorics. The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination. Dan Gusfield

Lecture 6: GWAS in Samples with Structure. Summer Institute in Statistical Genetics 2015

Estimation of Genetic Recombination Frequency with the Help of Logarithm Of Odds (LOD) Method

Adaptive Design for Clinical Trials

Computational Haplotype Analysis: An overview of computational methods in genetic variation study

Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data

Inferring Social Ties across Heterogeneous Networks

Overview. Methods for gene mapping and haplotype analysis. Haplotypes. Outline. acatactacataacatacaatagat. aaatactacctaacctacaagagat

Experience with Bayesian Clinical Trial Designs for Cardiac Medical Devices

Data Mining. CS57300 Purdue University. March 27, 2018

Transgenesis. Stable integration of foreign DNA into host genome Foreign DNA is passed to progeny germline transmission

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Nature Genetics: doi: /ng.3254

Introduction to QTL mapping

Evolution of species range limits. Takuji Usui (Angert Lab) BIOL Nov 2017

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

ANNOUNCEMENTS. HW2 is due Thursday 2/4 by 12:00 pm. Office hours: Monday 12:50 1:20 (ECCH 134)

Genomic models in bayz

Identifying essential genes in M. tuberculosis by random transposon mutagenesis

Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits

Effects of protein binding on topological states of DNA minicircle

OriGene GFC-Arrays for High-throughput Overexpression Screening of Human Gene Phenotypes

Recombination, and haplotype structure

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Designing a Complex-Omics Experiments. Xiangqin Cui. Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham

7.03 Final Exam Review 12/19/2006

The Interaction-Interaction Model for Disease Protein Discovery

THREE LEVEL HIERARCHICAL BAYESIAN ESTIMATION IN CONJOINT PROCESS

Software for Automatic Detection and Monitoring of Fluorescent Lesions in Mice

Influence Maximization on Social Graphs. Yu-Ting Wen

Multi-site Time Series Analysis. Motivation and Methodology

Intro to population genetics

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Mazzeo (RAND 2002) Seim (RAND 2006) Grieco (RAND 2014) Discrete Games. Jonathan Williams 1. 1 UNC - Chapel Hill

Multi-level functional principal components analysis models for replicated genomics time series data sets

Lab 1: A review of linear models

MARKOV CHAIN MONTE CARLO SAMPLING OF GENE GENEALOGIES CONDITIONAL ON OBSERVED GENETIC DATA

Business Quantitative Analysis [QU1] Examination Blueprint

Computational Population Genomics

COMPUTATIONAL ANALYSIS OF A MULTI-SERVER BULK ARRIVAL WITH TWO MODES SERVER BREAKDOWN

Microarrays & Gene Expression Analysis

GCTA/GREML. Rebecca Johnson. March 30th, 2017

Genomic Selection with Linear Models and Rank Aggregation

Time-series microarray data simulation modeled with a case-control label

Designing Complex Omics Experiments

Measuring the Mere Measurement Effect in Non-Experimental Field Settings

Molecular Evolution. COMP Fall 2010 Luay Nakhleh, Rice University

Bayesian Designs for Clinical Trials in Early Drug Development

BOIN: A Novel Platform for Designing Early Phase Clinical Tri

TUTORIAL. Revised in Apr 2015

COMPUTER SIMULATIONS AND PROBLEMS

Systematic comparison of CRISPR/Cas9 and RNAi screens for essential genes

Application of Multilevel IRT to Multiple-Form Linking When Common Items Are Drifted. Chanho Park 1 Taehoon Kang 2 James A.

Multiple layers of B cell memory with different effector functions. Ismail Dogan, Barbara Bertocci, Valérie Vilmont, Frédéric Delbos,

Page 78

Forces Determining Amount of Genetic Diversity

Let s call the recessive allele r and the dominant allele R. The allele and genotype frequencies in the next generation are:

Statistical Methods for Network Analysis of Biological Data

Optimizing Synthetic DNA for Metabolic Engineering Applications. Howard Salis Penn State University

Non-parametric optimal design in dose finding studies

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Contributed chapter. 1 Summary. M.A. Newton 1

V 1 Introduction! Fri, Oct 24, 2014! Bioinformatics 3 Volkhard Helms!

12/8/09 Comp 590/Comp Fall

Bioinformatics : Gene Expression Data Analysis

Biased competition between Lgr5 intestinal stem cells driven by oncogenic mutation induces clonal expansion

Undergraduate research and graduate school so far

Architecture and Function of Mechanosensitive Membrane Protein Lattices Supplementary Information

Folia Oeconomica Stetinensia DOI: /foli FORECASTING RANDOMLY DISTRIBUTED ZERO-INFLATED TIME SERIES

ESTIMATING GENETIC VARIABILITY WITH RESTRICTION ENDONUCLEASES RICHARD R. HUDSON1

7.03 Final Exam. TA: Alex Bagley Alice Chi Dave Harris Max Juchheim Doug Mills Rishi Puram Bethany Redding Nate Young

Simple haplotype analyses in R

Gene Expression. Chapters 11 & 12: Gene Conrtrol and DNA Technology. Cloning. Honors Biology Fig

Constructing the Search Results Page 2016 Machine Learning Summer School Arequipa, Peru Dilan Gorur

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by

Syllabus for BIOS 101, SPRING 2013

Who Thinks About the Competition? Managerial ability and strategic entry in US local telephone markets by Avi Goldfarb and Mo Xiao (2010)

Introduction to microarrays

Transcription:

Michael A. University of Wisconsin Chaos and Complex Systems April 2006

Thanks Linda Clipson W.F. Dove Rich Halberg Stephen Stanhope Ruth Sullivan Andrew Thliveris

Outline Bio Three statistical questions 1. What is the polyclonal fraction? estimation 2. Is random collision plausible? testing 3. What is the spatial extent of interactions? spatial modeling

Multiple Intestinal Neoplasia (Min) mouse Inherits mutation in the tumor suppressor gene Apc (adenomatous polyposis coli) Presents X intestinal tumors (quantitative trait) Provides an animal model of intestinal cancer Biology of tumor initiation not well understood Full Apc inactivation is an early event in tumor formation Distribution of X is affected by modifier genes

Clonal or polyclonal origin? clonal polyclonal cells of a tumor descend from a single initiated aberrant cell cells descend from multiple initiated cells

Aggregation chimeras enable detection of polyclonality

Intestinal epithelium of B6 chimera: patchwork, tumor B6 Apc Min/+ Mom1 R/R B6 Apc Min/+ Mom1 R/R ROSA26/+

Section shows tumor cells from both embryonic lineages

Heterotypic tumors not infrequent at low tumor multiplicity Counts of small intestinal tumors Mouse %blue Total Heterotypic Pure blue Pure white Ambiguous 1 20 19 5 5 6 3 2 85 24 3 13 6 2 3 20 9 2 2 5 0 4 60 19 3 2 10 4 5 30 24 2 0 21 1 6 50 9 2 2 3 2 7 40 8 5 0 3 0 Total 112 22 24 54 12

Heterotypic polyclonal, but, polyclonal heterotypic Clonality Phenotype monoclonal (C = 1) polyclonal (C > 1) blue (B) P {B (C = 1)} P {B (C > 1)} P(B) white (W ) P {W (C = 1)} P {W (C > 1)} P(W ) heterotypic (HET) 0 P {HET (C > 1)} P(HET) P(C = 1) θ = P(C > 1) 100%

Q1: What fraction θ of tumors are polyclonal? HET (C > 1) P(HET) θ 22/(22 + 24 + 54) = 22% Is there a better estimate or lower bound? Novelli et al. 1996 proposed the lower bound β = 22/(22 + 24) = 48% P(HET) P(HET HOM min )

Novelli s bound β is not valid Theorem: Depending on system, either β θ or β θ. Sketch: Novelli s β is ok if θ = θ = P(C > 1 HET HOM min ), but there is a gap θ < θ, under some regularity. Problem: Estimation of θ is sensitive to assumptions about the mechanisms by which clones are bound into polyclonal tumors.

Random collision: a simple mechanism of polyclonality

Q2: Are the data consistent with random collision? Idea: Considering low tumor multiplicity and small size, the number of collisions should be low on H 0. Test statistic: Number of heterotypic tumors Methods: Unknown, mouse specific numbers of initiated cells Complicated distributions induced on # collided pairs, triples, etc. Overdispersion DETAILS of stochastic geometry approxs & posterior predictive inference approach

Random collision is not plausible

Going further Spatial data and analysis reveal the extent of spatial interaction among clones.

Polyclonal tumors have opportunity to be heterotypic at boundaries Intestinal epithelium adjacent to a tumor...plus more...images from regions adjacent to every tumor in 3 mice

Crypts, rather than cells, are the basic structural units crypt an organized group of proliferating cells - intestinal epithelium formed from O(10 5 ) crypts - crypts are clonal Crypts from non-chimeric mouse Problem: Blue/white images mask crypt arrangement

Crypt arrangement data from non-chimeric mouse On Delaunay triangulation

Computational inference task Combine

Statistical reconstruction of crypts in chimeric patches # crypts 5642 # edges 16902 # triangles 33765 % white 39 % white 33 % white 29 % mixed 14 % mixed 21

Blow up

Approach: Bayesian image restoration data I, a chimeric pattern image unknown crypt layout c = {c i }, the collection of crypt centers crypt reconstructions ĉ by MCMC sampling from p(c I ) p(c) }{{} p(i c) }{{} prior likelihood estimate detection rate P(HET C > 1) or P(HET) as f (ĉ)

Image prior point process prior for crypt centers c { p(c) exp } (i,j) h(d i,j) for a potential function h and inter-crypt distances {d i,j } hard core model; Ripley model; (fixed n) Estimate prior features using crypt arrangement data. Details

Image likelihood Model: p(i c) Premise: crypts are probably pure Latent crypt colors (blue/white) iid Bernoulli(p) If crypt i is blue, each pixel in circle near c i is white w.p. ɛ. If crypt i is white, each pixel in circle near c i is blue w.p. ɛ { } {p p(i c) = pɛ w(i) (1 ɛ) b(i) + (1 p)ɛ b(i) (1 ɛ) w(i) B (1 p) W } i where w(i) and b(i) are numbers of white and blue pixels near i, and W and B are numbers in the intercryptal space.

Posterior sampling For each image: run Metropolis algorithm from regular hexagonal start pluck end state as reconstruction ĉ triangulate ĉ and compute summaries f (ĉ) Fortunately: very low posterior variance of certain features f (c) good robustness to n

Crypt reconstruction indicates short-range interactions

Concluding remarks Intestinal tumors can be polyclonal. Min mouse chimera; improved marker; reduced multiplicity Estimation of the polyclonal fraction is difficult Novelli s bound sensitivity to polyclonal mechanism Heterotypic rates are too large for random collision. stochastic geometry posterior predictive p-values Local interaction at the range of 1 to 2 neighboring crypts explains the data. crypt reconstruction via Bayesian image analysis; disc model

Main references, MA (2006). On estimating the polyclonal fraction in lineage-marker studies of tumor origin. Biostatistics, in press., MA, Clipson, L, Thliveris, AT, and Halberg, RB (2006). A statistical test of the hypothesis that intestinal tumors arise by random collision of initiated clones. Biometrics, in press. Thliveris, AT, Halberg, RB, Clipson L, Dove, WF, Sullivan, R, Washington, MK, Stanhope, S, and, MA (2005). Polyclonality of familial murine adenomas: Analysis of mouse chimeras with low tumor multiplicity suggest short-range interactions. Proc. Natl. Acad. Sci. USA, 102, 6960-6965.

Thanks

Appendices

Random collision hypothesis RC: N initiated cells emerge randomly within intestine Two collide if they are within δ Tumors correspond to connected subsets of the induced graph Is there an argument against random collision using count and size data? Intuition: we see more mixed tumors than expected considering small sizes and low multiplicity

Simple collision theory N = number of initiated crypts = X 1 + 2X 2 + 3X 3 + where X j = number of tumors formed from j collided crypts and thus the number of tumors is X = j X j. From Armitage (1949) E(X 1 ) N exp( 4ψ) ( E(X 2 ) 2N E(X 3 ) N ( ) ψ 4π + 3 3 ψ 2 π 4(2π + 3 ) 3) ψ 2. 3π where ψ = πnδ 2 /(4A) Poisson approximation holds under sparse graph conditions

Testing random collision Use size data to set δ Develop Poisson/neg. binomial model for singles X i 1, doubles X i 2, and triples X i 3, mouse i Conditional on count data {X i = (X1 i + X 2 i + X 3 i )}, simulate posterior of singles, doubles and triples Simulate posterior predictive of numbers sectored randomly paint doubles, triples Compare to observed numbers sectored

Poisson/negative binomial Unknowns µ Z i α (X i 1, X i 2, X i 3 ) expected number of initiated crypts per mouse Gamma(α, α) over-dispersion effect, mouse i shape parameter counts of singles, doubles, triples, mouse i Use conditional Poisson model with N replaced by µz i for mouse i Fit by MCMC

Posterior

Posterior predictive

Random collision test results Tumor count Mouse ID %blue tissue total white blue heterotypic NA p-value 100 20 19 6 5 5 3 0.000 122 85 24 6 13 3 2 0.002 154 20 9 5 2 2 0 0.002 209 60 19 10 2 3 4 0.004 225 30 24 21 0 2 1 0.062 237 50 9 3 2 2 2 0.004 244 40 8 3 0 5 0 0.000 δ = 1.5mm

Model checking Return to main

Novelli s bound compared to θ: biclonal model

Tessellation and Triangulation Return to crypts

A useful image transform

Chimeric pattern image summaries

Disc model Assume S = # heterotypic tumors Binomial {n, θf (r)}, where θ is the polyclonal fraction.

Disc model results Inference Limitations - short-range interactions explain the tumor count data - model allows elementary interactions only - characterization is not in terms of crypt structure

Return to crypts

Crypt data Crypt locations, sample region (data A)

Fitting 1 and checking spatial model of crypt locations 1 maximum pseudo-likelihood

Return to main