High Throughput Gene Mapping Brian S. Yandell

Size: px
Start display at page:

Download "High Throughput Gene Mapping Brian S. Yandell"

Transcription

1 High Thrughput Gene Mapping Brian S. Yandell Summer Research Prgram in Bistatistics June UW-Madisn Yandell

2 central dgma via micrarrays (Bchner 2003) UW-Madisn Yandell

3 what can yu d? participate in hierarchy f research teams bistatistics: Yandell, Kendzirski, 3-5 grad students biinfrmatics: Attie, Lan (bichem), Craven 2 CS grad students bichemistry: (ptinal) weekly interdisciplinary lab meetings cnduct data analysis f 1-2 large data sets 30,000 respnses, 60 individuals, 200 genetic markers learn multivariate statistical & quantitative statistical methds develp innvative graphical summaries develp statistical cmputing tls learn abut cnstructin f R libraries and archiving develp new cde with ptential wide usage transfer research methds t practice thrugh user-friendly cde UW-Madisn Yandell

4 studying diabetes in an F2 segregating crss f inbred lines B6.b x BTBR.b F1 F2 selected mice with b/b alleles at leptin gene (chr 6) measured and mapped bdy weight, insulin, glucse at varius ages (Stehr et al Diabetes) sacrificed at 14 weeks, tissues preserved gene expressin data Affymetrix micrarrays n parental strains, F1 (Nadler et al PNAS; Ntambi et al PNAS) RT-PCR fr a few mrna n 108 F2 mice liver tissues (Lan et al Diabetes; Lan et al Genetics) Affymetrix micrarrays n 60 F2 mice liver tissues design (Jin et al Genetics tent. accept) analysis (wrk in prgress) UW-Madisn Yandell

5 Type 2 Diabetes Mellitus UW-Madisn Yandell

6 Insulin Requirement decmpensatin frm UW-Madisn Unger & Orci FASEB J. (2001) 15,312 Yandell

7 glucse insulin (curtesy AD Attie) UW-Madisn Yandell

8 why map gene expressin as a quantitative trait? cis-r trans-actin? des gene cntrl its wn expressin? r is it influenced by ne r mre ther genmic regins? evidence fr bth mdes (Brem et al Science) simultaneusly measure all mrna in a tissue ~5,000 mrna active per cell n average ~30,000 genes in genme use genetic recmbinatin as natural experiment mechanics f gene expressin mapping measure gene expressin in intercrss (F2) ppulatin map expressin as quantitative trait (QTL) adjust fr multiple testing UW-Madisn Yandell

9 idea f mapping micrarrays (Jansen Nap 2001) UW-Madisn Yandell

10 interval mapping basics bserved measurements Y = phentypic trait X = markers & linkage map i = individual index 1,,n missing data missing marker data Q = QT gentypes alleles QQ, Qq, r qq at lcus unknwn quantities λ = QT lcus (r lci) θ = phentype mdel parameters m = number f QTL pr(q X,λ,m) recmbinatin mdel grunded by linkage map, experimental crss recmbinatin yields multinmial fr Q given X pr(y Q,θ,m) phentype mdel distributin shape (assumed nrmal here) unknwn parameters θ (culd be nn-parametric) bserved X Y missing Q unknwn λ θ after Sen Churchill (2001) UW-Madisn Yandell

11 recmbinatin mdel pr(q X,λ) lcus λ is distance alng linkage map identifies flanking marker regin flanking markers prvide gd apprximatin map assumed knwn frm earlier study inaccuracy slight using nly flanking markers extend t next flanking markers if missing data culd cnsider mre cmplicated relatinship but little change in results pr(q X,λ) = pr(gen map, lcus) pr(gen flanking markers, lcus) λ chrmsme X Q? k X k 1 UW-Madisn Yandell

12 idealized phentype mdel trait = mean additive errr trait = effect_f_gen errr pr( trait gen, effects ) Y = G Q E Qq pr( Y Q, θ) = qq QQ nrmal( G Q 2, σ ) UW-Madisn Yandell

13 interval mapping bjective likelihd mixes ver gentypes Q L(λ,θ Y) = prduct i [sum Q pr(q X i,λ) pr(y i Q,θ)] maximize likelihd t estimate lci & effects LOD = lg10( L(λ,θ Y) / null likelihd ) Bayesian psterir samples Q as missing data pr(λ,q,θ Y,X) = pr(λ,θ) prduct i pr(q i X i,λ) pr(y i Q i,θ) average ver unknwn Q t study lci & effects UW-Madisn Yandell

14 simple LOD map fr PDI: cis-regulatin (Lan et al. 2003) LOD = 4.6 * LR marker map acrss 19 chrmsmes UW-Madisn Yandell

15 cmplicated trans-actin fr SCD1 (3-4 gene regins influence expressin f SCD1) dminance? UW-Madisn Yandell

16 statistical interactin fr SCD1 epistasis LOD peaks jint LOD peaks UW-Madisn Yandell

17 multiple QTL phentype mdel phentype affected by gentype & envirnment pr(y Q,θ) ~ N(G Q, σ 2 ) Y = G Q envirnment partitin gentypic mean int QTL effects G Q = µ β 1 (Q)... β m (Q) β 12 (Q)... G Q = mean main effects epistatic interactins general frm f QTL effects fr mdel M G Q = µ sum j in M β j (Q) M = number f terms in mdel M < 2 m UW-Madisn Yandell

18 $60,000 experiment UW-Madisn Yandell

19 expressin pleitrpy in yeast genme (Brem et al. 2002) crdinated expressin in muse genme (Schadt et al. 2003) UW-Madisn Yandell

20 frm gene expressin t super-genes PC r SVD decmpsitin f multiple traits Y = t traits n individuals decmpse as Y = UDW T U,W = rth-nrmal transfrms (eigen-vectrs) D = diagnal matrix with singular values transfrm prblem t principal cmpnents W 1 and W 2 uncrrelated "super-traits" interval map each PC separately W 1 = G * 1Q e * 1 may nly need t map a few PCs UW-Madisn Yandell

21 PC simply rtates & rescales t find majr axes f variatin mrna V mrna V1 UW-Madisn Yandell

22 multivariate screen fr gene expressing mapping principal cmpnents PC1(red) and SCD(black) PC2 (22%) PC1 (42%) UW-Madisn Yandell

23 PC acrss micrarray functinal grups 1500 mrna f 30, functinal grups 60 mice 2-35 mrna / grup which are interesting? examine PC1, PC2 circle size = # unique mrna UW-Madisn Yandell

24 hw well des PC1 d? ld peaks fr 2 QTL at best pair f chr vs. 500 permutatins UW-Madisn Yandell

25 factr ladings fr PC1&2 UW-Madisn Yandell

26 fcus n translatin machinery (EIF) UW-Madisn Yandell

27 UW-Madisn Yandell

28 chr 4 regin chr 15 regin UW-Madisn Yandell

29 UW-Madisn Yandell imprvements n PC? what is ur gal? reduce dimensinality fcus n QTL PC reduces dimensinality but may nt relate t genetics discriminant analysis (DA) rtate t imprve discriminatin red at each putative QTL Gilbert and le Ry (2003,2004)

30 genetic & envirnmental crrelatin with multiple traits genetic nly envirnmental nly Krl et al. (2001) bth UW-Madisn Yandell

31 discriminant analysis by marker pairs fr SCD1-influencing chrmsmes Chrmsme Chrmsme UW-Madisn Yandell

32 DA fr mre chrmsmes (mask values belw 8) Chrmsme Chrmsme UW-Madisn Yandell

33 what is the bilgical gal? understand bilgy f diabetes & besity find genes influencing mrna expressin lcalize genmic regins f high influence crdinated regulatin f many mrna? search databases fr candidate genes there find mrna expressin with strng signals priritize subset f 30,000 mrna find genmic regins that influence them cnduct fllwup experiments new genetic crsses, mre tissues detailed assays f bichemical pathways UW-Madisn Yandell