Update on the Genomics Data in the Health and Re4rement Study. Sharon Kardia Jennifer A. Smith University of Michigan April 2013
|
|
- Marcus O’Connor’
- 6 years ago
- Views:
Transcription
1 Update on the Genomics Data in the Health and Re4rement Study Sharon Kardia Jennifer A. Smith University of Michigan April 2013
2 Genetic variation in SNPs (Single Nucleotide Polymorphisms) ATTGCAATCCGTGG...ATCGAGCCA.TACGATTGCACGCCG ATTGCAAGCCGTGG...ATCTAGCCA.TACGATTGCAAGCCG ATTGCAAGCCGTGG...ATCTAGCCA TACGATTGCAAGCCG ATTGCAATCCGTGG...ATCGAGCCA.TACGATTGCACGCCG ATTGCAAGCCGTGG...ATCTAGCCA.TACGATTGCAAGCCG
3 Genotypes are called with varying uncertainty Intensity of Allele A Intensity of Allele G
4 Two easy ways dealing with uncertain genotypes 1. Genotype Calling: Choose the most likely genotype and con4nue as if it is true (p 11 =10%, p 12 =20% p 22 =70% => G=2) 2. Mean genotype = Dosage Use the weighted average genotype (p 11 =10%, p 12 =20% p 22 =70% => G=1.6)
5 AFer Data Cleaning: Imputa4on HapMap Consor4um Reference (Completed) Used in first wave of GWAS Imputed ~ 2.4 million SNPs Only completed on whites and blacks Not posted 1000 Genomes Reference (Completed) Used in second wave of GWAS Imputed ~22 million SNPs Posted to dbgap in August
6 Different Genotyping Platforms measure different SNPs
7 Linkage Disequilibrium (LD) is the correla4on among muta4ons across SNPs SNP D n LD Markers close together on chromosomes are often transmitted together, creating a correlation between the mutation. LD also arises when populations mix (admixture).
8 Basic Concepts! Parent 1! Parent 2! A " "B "! a " "b! X! A " "B "! a " "b! A B! A B! a b! a b! OR! A b! a B! A B! A B! a b! A B! A B! a b! a B! A B! A b! A b! etc! High LD -> No Recombination! (r 2 = 1) SNP1 tags SNP2! Low LD -> Recombination! Many possibilities!
9 Key Terms! LD (linkage disequilibrium): For a pair of SNP alleles, it s a measure of deviation from random association. Measured by D, r 2 Phased haplotypes: Estimated distribution of SNP alleles in a genomic region on a chromosome. We have a pair of each chromosomes so we have pairs of haplotypes! Tag SNPs: Minimum set of SNPs needed to identify a haplotype. High LD (e.g. r 2 >0.8) indicates two SNPs are nearly redundant, so each one acts as a tag for the other.!
10 HapMap Project! Phase 1! Phase 2*! Phase 3! Samples & POP panels! Genotyping centers! Unique QC+ SNPs! 269 samples! (4 panels)! HapMap International Consortium! Reference! Nature (2005) 437:p1299! 270 samples! (4 panels)! Perlegen! 1.1 M! 3.8 M! (phase I+II)! Nature (2007) 449:p851 1,115 samples! (11 panels)! Broad & Sanger! 1.6 M (Affy 6.0 & Illumina 1M)! Draft Rel. 1! (May 2008)! *This is the version that most gene4cs consor4a are using
11 Haplotype Blocks are mapped across the genome (for each ethnicity)
12 Figure 2. A schema4c of SNP types as defined in the IMPUTE2 imputa4on algorithm. Each individual is represented by a unique color in the horizontal bar(s), and alternate alleles at each SNP are represented as A and B. Sec4on (A) represents phased reference haplotypes, where two samples (4 phased chromosomes) are shown. Sec4on (B) represents three study samples with SNP genotype calls, as would be observed in GWAS array experiment. Sec4on (C) iden4fies the SNP type of each posi4on shown. Type 2 SNPs have data in both the reference and the study samples: posi4ons 1, 4, 6, 8, and 11. Type 0 SNPs have data in the reference but not in the study samples: posi4ons 3, 5, 9 10, and 12. Thus, data at type 2 SNPs (imputa4on basis) are used to impute type 0 SNPs (imputa4on target) in the study samples. Type 3 SNPs are those in study samples but not in the reference; ul4mately, these SNPs are extraneous to the imputa4on, which is why they are shown in white text. This figure is a based off of IMPUTE2 background documenta4on (see Web Resources).
13 Observed genotypes Study Sample HapMap Gonçalo Abecasis
14 Iden<fy match among reference Gonçalo Abecasis
15 Phase chromosomes, impute missing genotypes Gonçalo Abecasis
16
17
18 One more tricky piece: Haplotypes cross over (recombine)
19 Hidden Markov Model Hidden State S m : The pair of contributing reference haplotypes at marker m Data G m : Observed genotypes at marker m Goal: Infer S m
20 Algorithm Update one individual at a time: Construct haplotypes that match observed genotypes, from a pool of reference haplotypes. Forward: Calculate, cumulatively until the last marker, forward probabilities for observed genotypes and haplotype affiliation state S m Backward: Sample haplotype affiliation states (i.e., construct mosaic haplotypes) probabilistically according to forward probabilities and transition probabilities.
21 1000 Genomes Imputa4on Strategy Impute en4re sample Pre phase (es4mate haplotypes) Impute SNPs with > 4 copies of minor allele in any of the following HapMap racial/ethnic groups: African (AFR) Admixed American (AMR) East Asian (ASN) European (EUR)
22 The 1000 Genomes Reference sample Full Popula9on Name Abbrevia9on Number of Samples African Ancestry in Southwest US ASW 61 Luhya in Webuye, Kenya LWK 97 Yoruba in Ibadan, Nigeria YRI 88 Total African ancestry 246 Colombian in Medellin, Colombia CLM 60 Mexican Ancestry in Los Angeles, CA MXL 66 Puerto Rican in Puerto Rico PUR 55 Total American ancestry 181 Han Chinese in Beijing, China CHB 97 Han Chinese South, China CHS 100 Japanese in Tokyo, Japan JPT 89 Total Asian ancestry 286 Utah residents (CEPH) with Northern and Western European ancestry CEU 85 Toscani in Italia TSI 98 Bri4sh in England and Scotland GBR 89 Finnish in Finland FIN 93 Iberian popula4ons in Spain IBS 14 Total European ancestry 379 An overview of the 1,092 samples in the 1000 Genomes Project worldwide reference panel (phase I integrated variant set v3, March 2012), which was used to impute all study par4cipants. Each popula4on was assigned to one of four con4nental groupings: African (AFR), American (AMR), Asian (ASN), and European (EUR). All haplotypes in the phased reference panel are for unrelated, founder individuals only. This table is based on reference panel data downloaded from IMPUTE2 and the sample summary provided by the Project (see Web resources).
23 1000 Genomes Imputa4on Chromosome Study SNPs Imputa9on basis Imputa9on Output 1 171, ,997 1,639, , ,933 1,781, , ,934 1,501, , ,660 1,517, , ,185 1,378, , ,771 1,348, , ,601 1,228, , ,601 1,188, ,854 91, , , ,636 1,040, , ,929 1,038, ,377 97,955 1,006, ,080 72, , ,602 67, , ,755 63, , ,862 68, , ,966 59, , ,189 60, , ,210 42, , ,871 51, , ,884 28, , ,062 30, ,543 X 43,193 40, ,930 Totals 2,195,306 2,065,320 21,632,048 Study SNPs passing pre imputa4on filters (IMPUTE2 SNP types 2 and 3). Study SNPs passing pre imputa4on filters and overlapping with the reference panel (type 2). Imputa4on output is the sum of imputa4on basis (type 2) and imputa4on target (type 0) SNPs. Type 0 SNPs have been restricted to those with at least 4 copies of the minor allele in AFR, AMR, ASN, or EUR reference samples.
24 Quality control: Predic4on of Known Genotypes Minor Allele Frequency (in study samples) Number of SNPs Mean (Median) Overall Concordance Mean (Median) empirical dosage r2 < 0.1 1,059, (0.998) (0.919) 0.1 1,005, (0.994) (0.994) Quality metrics for all masked SNPs, dichotomized into groups of MAF < 0.1 vs. MAF 0.1. The second column shows the number of SNPs in each MAF group. Mean and median values are presented for overall genotype concordance and empirical dosage r 2 (in IMPUTE2 metrics files, labeled as concord_type0 and r2_type0, respec4vely). No info threshold has been applied here, such that all masked and imputed SNPs in each MAF category are included in these averages.
25 B ) Figure 3. Summaries of quality metrics at all imputed SNPs. Panel A shows the distribu4on of the info quality metric, with a dashed line indica4ng a poten4al 0.3 threshold value. Panel B is the distribu4on of certainty, the average certainty of best guess genotypes. Panel C summarizes the rela4onship between the info score and MAF. The secondary axis indicates the count of SNPs in each MAF bin (0.01 intervals).
26 Figure 4. A comparison of imputa4on quality metrics by chromosome for all imputed SNPs, info in panel A and certainty in panel B. Outlier values are not displayed in these box plots. On the x axis, 23 denotes the X chromosome. A) B)
27 Figure 5. Quality metrics for all masked SNPs, grouped into MAF bins at 0.01 intervals. Panel (A) shows the number of SNPs per MAF bin and, on the secondary y axis, the frac4on of SNPs in the bin passing an info filter threshold of 0.8. Panel (B) plots the average empirical dosage r2 metric per MAF bin, both before and afer filtering on the info score (black and gray data series, respec4vely). Similarly, panel (C) is the concordance between the observed and the most likely imputed genotype at masked SNPs within each MAF bin, with and without the info filter.
28 Apo E Imputa4on
29 Descrip9on Whites African Americans Hispanics Gene SNP Is the SNP genotyped on the Omni2.5? Call rate on Omni2.5 (from CIDR) APOE rs429358, rs7412 rs No rs7412 Yes rs N/A rs7412 0% from CIDR, 90% afer re clustering Imputa9on reference database (build and website) 1000 Genomes reference panels generated at University of Michigan (Interim Phase I, data freeze, haplotypes) hup:// PhaseI Interim.html Imputa9on reference panel The EUR reference panel (87 CEU +98 TSI +89 GBR +93 FIN +14 IBS) The EUR + AFR (88 YRI + 97 LWK +61 ASW ) reference panel s The EUR + AMR (60 CLM + 66 MXL + 55 PUR ) reference panels
30 Descrip9on Whites African Americans Hispanics Pre imputa9on quality control Region and number of SNPs in imputa9on Imputa9on program (including sepngs) Number of people included in imputa9on Imputa9on quality (Rsq) Exclude SNPs with MAF<0.01 or HWE< SNPs upstream and 1000 SNPs downstream on chromosome 19 MaCH , one step, 25 rounds Exclude SNPs with MAF<0.01 or HWE< SNPs upstream and 1000 SNPs downstream on chromosome 19 MaCH , one step, 50 rounds rs rs rs rs Exclude SNPs with MAF<0.01 or HWE< SNPs upstream and 1000 SNPs downstream on chromosome 19 MaCH , one step, 50 rounds Rs Rs
31 Descrip9on Whites African Americans Hispanics Valida9on dataset Number of par9cipants in valida9on dataset Concordance rate Quality control procedures Concordance rate aser quality control procedures ADAMS ADAMS ADAMS APOE genotype 113/117 (96.6%) Set genotype to missing if posterior probability <0.90 (N=2 for rs7412) (N=4 for rs429358) APOE genotype 111/114 (97.4%) APOE genotype 25/28 (89.3%) Set genotype to missing if posterior probability <0.90 (N=1 for rs7412) (N=2 for rs429358) APOE genotype 23/25 (92.0%) APOE genotype 15/15 (100%) Set genotype to missing if posterior probability <0.90 (N=1 for rs7412) (N=0 for rs429358) APOE genotype 14/14 (100%)
32 Algorithm to get APOE genotype from the best guess genotype of SNP rs7412 and rs Genotype Frequencies of APOE Apo E imputa4on in HRS rs7412 best guess rs best guess APOE genotype genotype genotype T/T T/T e2/e2 C/T T/T e2/e3 C/T C/T e2/e4 C/C T/T e3/e3 C/C C/T e3/e4 C/C C/C e4/e4 Whites (N=8652) AA (N=1519) HRS_all (N=12367) e2/e2, N (%) 48 (0.56) 20 (1.32) 78 (0.63) e2/e3, N (%) 1131 (13.07) 226 (14.88) 1555 (12.57) e2/e4, N (%) 199 (2.30) 73 (4.81) 301 (2.43) e3/e3, N (%) 5197 (60.07) 704 (46.35) 7415 (59.96) e3/e4, N (%) 1890 (21.84) 434 (28.57) 2751 (22.24) e4/e4, N (%) 187 (2.16) 62 (4.08) 267 (2.16)
33 Principal Components and Popula4on Stra4fica4on
34 Popula4on Stra4fica4on Problem: Diseases and muta4on frequencies are confounded by race Example: Imagine that Hypertension has a prevalence of 50% in blacks and 25% in whites. do a gene4c analysis in the HRS (20 million simple chisquare tests) the vast majority of the hits will be for the black/ white differences and not hypertension. Solu4on: Es4mate and adjust for gene4c variability using principal components
35 Analysis of Genotypes only Principle Component Analysis reveals SNP-vectors explaining largest variation in the data
36 PCA of POPRES cohort
37 The HRS Par4cipants + HapMap par4cipants Figure 1. Principal component analysis of 12,507 unique study par4cipants and 1,230 HapMap controls, using a set of 96,134 autosomal SNPs pruned for both long and short range linkage disequilibrium. For study samples, color coding is according to self iden4fied race while symbol denotes ethnicity (Hispanic or non Hispanic). HapMap samples are color coded by membership in 1 of 11 Phase 3 popula4ons: ASW: African ancestry in Southwest USA; CEU: Utah residents with Northern and Western European ancestry from the CEPH collec4on; CHB: Han Chinese in Beijing, China; CHD: Chinese in Metropolitan Denver, Colorado; GIH: Gujara4 Indians in Houston, Texas; JPT: Japanese in Tokyo, Japan; LWK: Luhya in Webuye, Kenya; MEX: Mexican ancestry in Los Angeles, California; MKK: Maasai in Kinyawa, Kenya; TSI: Tuscan in Italy; and YRI: Yoruban in Ibadan, Nigeria. The percent variance explained by each of these first two components is noted on the axis labels. (Also Figure 11 from the genotype QC report.)
38 What have we done so far?
39 Completed GWAS Analysis on HRS Traits GWAS (29x2=58 total) Gait velocity (with and without osteoporosis, NHW and AA) 4 Longevity (autosomal plus X chromosome) 1 Hand grip strength (main effects and interac4on with sex) 2 Disability in Aging (sex stra4fied) 2 Educa4onal Auainment (quan4ta4ve and dichotomous, sex stra4fied) 4 Subjec4ve well being (3 well being outcomes) 3 Fer4lity (sex stra4fied) 2 Op4mism/Pessimism (sex stra4fied, sex adjusted, interac4on with sex) 5 Personality Traits (six traits) 6 GWAS Replica4on (selected SNPs or complex modeling) Body mass index and SNPs in the PCSK1 gene (NHW and AA) Systolic and diastolic blood pressure (NHW and AA) Hypertension (NHW and AA) Longevity and telomere SNPs CESD subscales (Depression)
40 GWAS of Personality Traits N = 8,113 European Americans Big Five personality traits Agreeableness Extraversion Conscien4ousness Neuro4cism Openness
41 GWAS of Personality Traits Modeling: linear model with age and sex included as covariates Personality Trait = α + β 1 (age) + β 2 (sex) + β 3 (SNP dosage) Analysis Program: PLINK
42 QQ Plots for Neuro4cism and Openness Neuro4cism Openness Observed log(p value) Observed log(p value) Expected log(p value) Expected log(p value)
43
44 Norms for Gene4c Consor4a Large cohorts (>50,000) for discovery Equally large replica4on samples Modest harmoniza4on of traits Shared analysis plans Tons of conference calls Meta analysis of results followed by bioinforma4cs Wri4ng groups 1 4 Years to publica4on
45 Resources for Inves4gators
46 User friendly Resources dbgap Candidate gene list Results database
47 dbgap Website Gene4c Data Released April 03, users have been approved to download the data
48 Summary of Projects through dbgap GWAS of longevity, cholesterol levels, cogni4ve func4on, liver disease, leukemia, lupus, economic outcomes Gene environment interac4ons on cogni4ve func4on, mood disorders, body mass Mendelian Randomiza4on to es4mate effects of gene4c, social, and physical risk factors on long term health Control popula4on for GWAS and other studies Assorta4ve ma4ng with respect to alcohol use
49 Crea4ng a bridge to known candidate genes The list of SNPs/genes was compiled from two lists sent to us List 1: Popular established gene4c variants used most ofen by behavioral scien4sts Compiled by Terrie Moffiu and Avshalom Caspi, October 2011 Several genes were also added to the list because they were suggested by Robert Wallace, October 2011 List 2: Cogni4ve age related candidate genes A list provided by Carol Prescou and Jack McArdle
50 Example of Candidate Gene List
51 What s Next? Exome chip data (~16,300 samples from 2006, 2008 & 2010 collec4on waves.) Over 200,000 func4onal muta4ons measured at once Rare variants (needs a completely different analysis approach e.g. SKAT) Fancy mul4gene es4mates of risk (Risk Index) Measured gene4c heritability Gene environment interac4ons Many consor4a conference calls
52 Illumina Exome Chip SNP Set Number of Candidates Number of Successful Designs Coding Content 275, ,094 GWAS Tag SNPs 5,763 5,325 Grid of Common Variants 5,710 5,286 Randomly Selected Synonymous SNPs 5,000 4,651 AIM African Ancestry 3,388 3,241 Addi9onal Notes An addi4onal set of 8,242 SNPs that were unique to the 1000 Genomes Project and popula4ons under represented in the design was added. For 1,000 SNPs, assays were generated on both strands in order to facilitate QC efforts and future development of methods for genotyping of rare variants. AIM Na4ve American Ancestry 1, HLA Tags 2,536 2,459 ESP Requests 1, Fingerprint SNPs MicroRNA Target Sites Mitochondrial Variants Chromosome Y Indels
53 Es4ma4ng the gene4c rela4onship from genome wide SNPs A ijk is the gene4c rela4onship between individual j and k at SNP i N is the number of SNPs p i is the allele frequency at SNP i x ij is an indicator variable that takes value of 0,1 or 2 if the genotype of the j th individual at SNPi is bb, Bb or BB
54 An Example of The Gene4c Rela4onship Matrix Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6.. Subject Subject Subject Subject Subject Subject 6..
55 Histogram of the Gene4c Relatedness Among the HRS subjects (N=12367) Mean= 8.382e 05 Median=0.008 Min= Max=0.559
56 Histogram of the Gene4c Relatedness used in our first heritability study (N=4367) Mean=0.010 Median=0.011 Min= Max=0.025
57 Es4mate Heritability of BMI Using Unrelated Individuals Square of z score difference = α + βa jk where α = 2σ 2 p Β = 2σ 2 g Heritability = σ 2 g /σ2 p Gene4c rela4onship A jk
58 Summary New GWAS requests are slowing down Ongoing connec4on to 10 GWAS consor4a We have finished exome chip cleaning, working with dbgap for submission, and new consor4a analysis Candidate Gene List being reviewed We would like to find a way to offer the gene4c relatedness matrix to inves4gators Should we begin a series of gene4cs webinars? What would be helpful to you?
S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics
S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public
More informationPopulation description. 103 CHB Han Chinese in Beijing, China East Asian EAS. 104 JPT Japanese in Tokyo, Japan East Asian EAS
1 Supplementary Table 1 Description of the 1000 Genomes Project Phase 3 representing 2504 individuals from 26 different global populations that are assigned to five super-populations Number of individuals
More informationGenotyping Technology How to Analyze Your Own Genome Fall 2013
Genotyping Technology 02-223 How to nalyze Your Own Genome Fall 2013 HapMap Project Phase 1 Phase 2 Phase 3 Samples & POP panels Genotyping centers Unique QC+ SNPs 269 samples (4 populations) HapMap International
More informationResources at HapMap.Org
Resources at HapMap.Org HapMap Phase II Dataset Release #21a, January 2007 (NCBI build 35) 3.8 M genotyped SNPs => 1 SNP/700 bp # polymorphic SNPs/kb in consensus dataset International HapMap Consortium
More informationHaplotypes, linkage disequilibrium, and the HapMap
Haplotypes, linkage disequilibrium, and the HapMap Jeffrey Barrett Boulder, 2009 LD & HapMap Boulder, 2009 1 / 29 Outline 1 Haplotypes 2 Linkage disequilibrium 3 HapMap 4 Tag SNPs LD & HapMap Boulder,
More informationDe novo human genome assemblies reveal spectrum of alternative haplotypes in diverse
SUPPLEMENTARY INFORMATION De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations Wong et al. The Supplementary Information contains 4 Supplementary Figures, 3
More informationI/O Suite, VCF (1000 Genome) and HapMap
I/O Suite, VCF (1000 Genome) and HapMap Hin-Tak Leung April 13, 2013 Contents 1 Introduction 1 1.1 Ethnic Composition of 1000G vs HapMap........................ 2 2 1000 Genome vs HapMap YRI (Africans)
More informationGenome variation - part 1
Genome variation - part 1 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 21 th January 2016 Aims of the session Introduce major
More informationTHE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE
: GENETIC DATA UPDATE April 30, 2014 Biomarker Network Meeting PAA Jessica Faul, Ph.D., M.P.H. Health and Retirement Study Survey Research Center Institute for Social Research University of Michigan HRS
More informationHuman Populations: History and Structure
Human Populations: History and Structure In the paper Novembre J, Johnson, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann A, Nelson MB, Stephens M, Bustamante CD. 2008. Genes mirror geography
More informationSUPPLEMENTAL MATERIAL
SUPPLEMENTAL MATERIAL Supplementary Table 1: RT-qPCR primer sequences. Sequences are shown from 5 to 3 direction; all primers are designed using mouse genome as reference. 36B4-F; TGAAGCAAAGGAAGAGTCGGAGGA
More informationHaplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014
Haplotypes 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Terminology Review llele: different forms of genecc variacons at a given gene or genecc locus Locus 1 has two alleles, and
More informationSupplementary Figure 1 a
Supplementary Figure 1 a b GWAS second stage log 10 observed P 0 2 4 6 8 10 12 0 1 2 3 4 log 10 expected P rs3077 (P hetero =0.84) GWAS second stage (BBJ, Japan) First replication (BBJ, Japan) Second replication
More informationThe Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm
The Whole Genome TagSNP Selection and Transferability Among HapMap Populations Reedik Magi, Lauris Kaplinski, and Maido Remm Pacific Symposium on Biocomputing 11:535-543(2006) THE WHOLE GENOME TAGSNP SELECTION
More informationUnderstanding genetic association studies. Peter Kamerman
Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -
More informationIntroduc)on to Sta)s)cal Gene)cs: emphasis on Gene)c Associa)on Studies
Introduc)on to Sta)s)cal Gene)cs: emphasis on Gene)c Associa)on Studies Lisa J. Strug, PhD Guest Lecturer Biosta)s)cs Laboratory Course (CHL5207/8) March 5, 2015 Gene Mapping in the News Study Finds Gene
More informationStatistical Tools for Predicting Ancestry from Genetic Data
Statistical Tools for Predicting Ancestry from Genetic Data Timothy Thornton Department of Biostatistics University of Washington March 1, 2015 1 / 33 Basic Genetic Terminology A gene is the most fundamental
More informationFast and accurate genotype imputation in genome-wide association studies through pre-phasing. Supplementary information
Fast and accurate genotype imputation in genome-wide association studies through pre-phasing Supplementary information Bryan Howie 1,6, Christian Fuchsberger 2,6, Matthew Stephens 1,3, Jonathan Marchini
More informationSequence variation Introductory bioinformatics for human genomics workshop, UNSW
Sequence variation Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 29 th January 2016 Aims of the session Introduce major human
More informationOffice Hours. We will try to find a time
Office Hours We will try to find a time If you haven t done so yet, please mark times when you are available at: https://tinyurl.com/666-office-hours Thanks! Hardy Weinberg Equilibrium Biostatistics 666
More informationGenome-wide association studies (GWAS) Part 1
Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations
More informationGenotype quality control with plinkqc Hannah Meyer
Genotype quality control with plinkqc Hannah Meyer 219-3-1 Contents Introduction 1 Per-individual quality control....................................... 2 Per-marker quality control.........................................
More informationHuman Genetics and Gene Mapping of Complex Traits
Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2015 Human Genetics Series Thursday 4/02/15 Nancy L. Saccone, nlims@genetics.wustl.edu ancestral chromosome present day chromosomes:
More informationDerrek Paul Hibar
Derrek Paul Hibar derrek.hibar@ini.usc.edu Obtain the ADNI Genetic Data Quality Control Procedures Missingness Testing for relatedness Minor allele frequency (MAF) Hardy-Weinberg Equilibrium (HWE) Testing
More informationGenetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics
Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get
More informationHuman Population Differentiation Is Strongly Correlated with Local Recombination Rate
Human Population Differentiation Is Strongly Correlated with Local Recombination Rate Alon Keinan 1,2,3 *, David Reich 1,2 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United
More informationAnalysing Alu inserts detected from high-throughput sequencing data
Analysing Alu inserts detected from high-throughput sequencing data Harun Mustafa Mentor: Matei David Supervisor: Michael Brudno July 3, 2013 Before we begin... Even though I'll only present the minimal
More informationPopulation differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia
Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Kevin Galinsky Harvard T. H. Chan School of Public Health American Society
More informationHuman Population Differentiation is Strongly Correlated With Local Recombination Rate
Human Population Differentiation is Strongly Correlated With Local Recombination Rate The Harvard community has made this article openly available. Please share how this access benefits you. Your story
More informationDNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros
DNA Collection Data Quality Control Suzanne M. Leal Baylor College of Medicine sleal@bcm.edu Copyrighted S.M. Leal 2016 Blood samples For unlimited supply of DNA Transformed cell lines Buccal Swabs Small
More informationIL1B-CGTC haplotype is associated with colorectal cancer in. admixed individuals with increased African ancestry
IL1B-CGTC haplotype is associated with colorectal cancer in admixed individuals with increased African ancestry María Carolina Sanabria-Salas 1, 2,*, Gustavo Hernández-Suárez 1, Adriana Umaña- Pérez 2,
More informationGenotype Prediction with SVMs
Genotype Prediction with SVMs Nicholas Johnson December 12, 2008 1 Summary A tuned SVM appears competitive with the FastPhase HMM (Stephens and Scheet, 2006), which is the current state of the art in genotype
More informationH3A - Genome-Wide Association testing SOP
H3A - Genome-Wide Association testing SOP Introduction File format Strand errors Sample quality control Marker quality control Batch effects Population stratification Association testing Replication Meta
More informationThe HapMap Project and Haploview
The HapMap Project and Haploview David Evans Ben Neale University of Oxford Wellcome Trust Centre for Human Genetics Human Haplotype Map General Idea: Characterize the distribution of Linkage Disequilibrium
More informationGENOME-WIDE data sets from worldwide panels of
Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.116681 Population Structure With Localized Haplotype Clusters Sharon R. Browning*,1 and Bruce S. Weir *Department of Statistics,
More informationIntroduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill
Introduction to Add Health GWAS Data Part I Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Outline Introduction to genome-wide association studies (GWAS) Research
More informationPopula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on Computa.onal Genomics Seyoung Kim
Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on 02-710 Computa.onal Genomics Seyoung Kim Overview Two fundamental forces that shape genome sequences Recombina.on Muta.on, gene.c
More informationBioinformatic Analysis of SNP Data for Genetic Association Studies EPI573
Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Mark J. Rieder Department of Genome Sciences mrieder@u.washington washington.edu Epidemiology Studies Cohort Outcome Model to fit/explain
More informationGenome-wide analyses in admixed populations: Challenges and opportunities
Genome-wide analyses in admixed populations: Challenges and opportunities E-mail: esteban.parra@utoronto.ca Esteban J. Parra, Ph.D. Admixed populations: an invaluable resource to study the genetics of
More informationRedefine what s possible with the Axiom Genotyping Solution
Redefine what s possible with the Axiom Genotyping Solution From discovery to translation on a single platform The Axiom Genotyping Solution enables enhanced genotyping studies to accelerate your research
More informationQuality Control Report for Exome Chip Data University of Michigan April, 2015
Quality Control Report for Exome Chip Data University of Michigan April, 2015 Project: Health and Retirement Study Support: U01AG009740 NIH Institute: NIA 1. Summary and recommendations for users A total
More informationPopulation stratification. Background & PLINK practical
Population stratification Background & PLINK practical Variation between, within populations Any two humans differ ~0.1% of their genome (1 in ~1000bp) ~8% of this variation is accounted for by the major
More informationEPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011
EPIB 668 Genetic association studies Aurélie LABBE - Winter 2011 1 / 71 OUTLINE Linkage vs association Linkage disequilibrium Case control studies Family-based association 2 / 71 RECAP ON GENETIC VARIANTS
More informationUK Biobank Axiom Array
DATA SHEET Advancing human health studies with powerful genotyping technology Array highlights The Applied Biosystems UK Biobank Axiom Array is a powerful array for translational research. Designed using
More informationNature Genetics: doi: /ng.3143
Supplementary Figure 1 Quantile-quantile plot of the association P values obtained in the discovery sample collection. The two clear outlying SNPs indicated for follow-up assessment are rs6841458 and rs7765379.
More informationBrowsing Genes and Genomes with Ensembl
Browsing Genes and Genomes with Ensembl Victoria Newman Ensembl Outreach Officer EMBL-EBI Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website.
More informationSupplementary Online Content
Supplementary Online Content Lee JH, Cheng R, Barral S, Reitz C, Medrano M, Lantigua R, Jiménez-Velazquez IZ, Rogaeva E, St. George-Hyslop P, Mayeux R. Identification of novel loci for Alzheimer disease
More informationSupplementary Note: Detecting population structure in rare variant data
Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to
More informationLecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics
Lecture: Genetic Basis of Complex Phenotypes 02-715 Advanced Topics in Computa8onal Genomics Genome Polymorphisms A Human Genealogy TCGAGGTATTAAC The ancestral chromosome From SNPS TCGAGGTATTAAC TCTAGGTATTAAC
More informationHuman Genetics and Gene Mapping of Complex Traits
Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2017 Human Genetics Series Tuesday 4/10/17 Nancy L. Saccone, nlims@genetics.wustl.edu ancestral chromosome present day chromosomes:
More informationVEGAS2: Gene-based test software using 1000 Genomes reference sets. User Manual
VEGAS2: Gene-based test software using 1000 Genomes reference sets. User Manual Version: 16:09:002 Date: 16 th September 2014 By Aniket Mishra, Stuart Macgregor Statistical Genetics Group QIMR Berghofer
More informationSupplementary Figures
1 Supplementary Figures exm26442 2.40 2.20 2.00 1.80 Norm Intensity (B) 1.60 1.40 1.20 1 0.80 0.60 0.40 0.20 2 0-0.20 0 0.20 0.40 0.60 0.80 1 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 Norm Intensity
More informationGenome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma. Supplementary Information
Genome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma Vinod Kumar 1,2, Naoya Kato 3, Yuji Urabe 1, Atsushi Takahashi 2, Ryosuke Muroyama 3, Naoya Hosono
More informationNews. The International HapMap Project
HapMap News A Publication of the Coriell Institute for Medical Research, V olume 1, 2004 The International HapMap Project Excitement is building as scientists begin to construct a resource called the haplotype
More informationRoadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications
Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Alex Helm Product Manager Genotyping Applications 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa,
More informationIllumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era
Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Anthony Green Sr. Genotyping Sales Specialist North America 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx,
More informationGenome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey
Genome-Wide Association Studies Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey Introduction The next big advancement in the field of genetics after the Human Genome Project
More informationExploring genomic databases: Practical session "
Exploring genomic databases: Practical session Work through the following practical exercises on your own. The objective of these exercises is to become familiar with the information available in each
More informationAnalysis of genome-wide genotype data
Analysis of genome-wide genotype data Acknowledgement: Several slides based on a lecture course given by Jonathan Marchini & Chris Spencer, Cape Town 2007 Introduction & definitions - Allele: A version
More informationAxiom mydesign Custom Array design guide for human genotyping applications
TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required
More informationGlobal Screening Array (GSA)
Technical overview - Infinium Global Screening Array (GSA) with optional Multi-disease drop in (MD) The Infinium Global Screening Array (GSA) combines a highly optimized, universal genome-wide backbone,
More informationAppendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by
Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by author] Statistical methods: All hypothesis tests were conducted using two-sided P-values
More informationS SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.
S SG ection ON tatistical enetics Metabolomics meets Genomics Hemant K. Tiwari, Ph.D. Professor and Head Section on Statistical Genetics Department of Biostatistics School of Public Health Metabolomics:
More informationBy the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs
(3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable
More informationSupplementary Figure 1. Study design of a multi-stage GWAS of gout.
Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 2. Plot of the first two principal components from the analysis of the genome-wide study (after QC) combined with
More informationARTICLE Contrasting X-Linked and Autosomal Diversity across 14 Human Populations
ARTICLE Contrasting X-Linked and Autosomal Diversity across 14 Human Populations Leonardo Arbiza, 1,2 Srikanth Gottipati, 1,2 Adam Siepel, 1 and Alon Keinan 1, * Contrasting the genetic diversity of the
More informationMulti-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer
Multi-SNP Models for Fine-Mapping Studies: Application to an association study of the Kallikrein Region and Prostate Cancer November 11, 2014 Contents Background 1 Background 2 3 4 5 6 Study Motivation
More informationSupplementary Methods Illumina Genome-Wide Genotyping Single SNP and Microsatellite Genotyping. Supplementary Table 4a Supplementary Table 4b
Supplementary Methods Illumina Genome-Wide Genotyping All Icelandic case- and control-samples were assayed with the Infinium HumanHap300 SNP chips (Illumina, SanDiego, CA, USA), containing 317,503 haplotype
More informationComparison of the levels of diversity between coldspots (CS) and highly recombining regions (HRRs) for SNPs in the FCQ data set.
Supplementary Figure 1 Comparison of the levels of diversity between coldspots (CS) and highly recombining regions (HRRs) for SNPs in the FCQ data set. Odds ratios (ORs) are computed to compare SNP density
More informationStructure, Measurement & Analysis of Genetic Variation
Structure, Measurement & Analysis of Genetic Variation Sven Cichon, PhD Professor of Medical Genetics, Director, Division of Medcial Genetics, University of Basel Institute of Neuroscience and Medicine
More informationCS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016
CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene
More informationImputation. Genetics of Human Complex Traits
Genetics of Human Complex Traits GWAS results Manhattan plot x-axis: chromosomal position y-axis: -log 10 (p-value), so p = 1 x 10-8 is plotted at y = 8 p = 5 x 10-8 is plotted at y = 7.3 Advanced Genetics,
More informationA genome wide association study of metabolic traits in human urine
Supplementary material for A genome wide association study of metabolic traits in human urine Suhre et al. CONTENTS SUPPLEMENTARY FIGURES Supplementary Figure 1: Regional association plots surrounding
More informationAlkes Price Harvard School of Public Health January 24 & January 26, 2017
EPI 511, Advanced Population and Medical Genetics Week 1: Intro + HapMap / 1000 Genomes Linkage Disequilibrium Alkes Price Harvard School of Public Health January 24 & January 26, 2017 EPI 511: Course
More informationPLINK gplink Haploview
PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,
More informationИСПОЛЬЗОВАНИ Е ЧИПАТОРОВ. Клиническая лаборатория
ИСПОЛЬЗОВАНИ Е ЧИПАТОРОВ Клиническая лаборатория 1 2 DISTRIBUTED VS CONSOLIDATED One machine per lab Optimized usage time and PI control Sample prep and data analysis is done inside the lab All equipment
More informationGenomics Resources in WHI. WHI ( ) Extension Study Steering Committee Meeting Seattle, WA May 05-06, 2011
Genomics Resources in WHI WHI (2010-2015) Extension Study Steering Committee Meeting Seattle, WA May 05-06, 2011 WHI Genomic Resources in dbgap Outcomes and traits in AA and Hispanics GWAS-SHARe Sequencing-ESP
More informationAmapofhumangenomevariationfrom population-scale sequencing
doi:.38/nature9534 Amapofhumangenomevariationfrom population-scale sequencing The Genomes Project Consortium* The Genomes Project aims to provide a deep characterization of human genome sequence variation
More informationMining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo
Mining GWAS Catalog & 1000 Genomes Dataset Segun Fatumo What is GWAS Catalog NHGRI GWA Catalog www.genome.gov/gwastudies Citation How to cite the NHGRI GWAS Catalog: Hindorff LA, MacArthur J (European
More informationTopics in Statistical Genetics
Topics in Statistical Genetics INSIGHT Bioinformatics Webinar 2 August 22 nd 2018 Presented by Cavan Reilly, Ph.D. & Brad Sherman, M.S. 1 Recap of webinar 1 concepts DNA is used to make proteins and proteins
More informationSupplementary Figures
Supplementary Figures Supplementary Figure 1: Loci associated with 2 hr glucose during pregnancy imputed to HAPMAP (a) and 1000 Genomes (b). Peak of association is in the first intron of HKDC1. Black bars
More informationSingle Nucleotide Polymorphisms (SNPs)
Single Nucleotide Polymorphisms (SNPs) Sequence variations Single nucleotide polymorphisms Insertions/deletions Copy number variations (large: >1kb) Variable (short) number tandem repeats Single Nucleotide
More informationProstate Cancer Genetics: Today and tomorrow
Prostate Cancer Genetics: Today and tomorrow Henrik Grönberg Professor Cancer Epidemiology, Deputy Chair Department of Medical Epidemiology and Biostatistics ( MEB) Karolinska Institutet, Stockholm IMPACT-Atanta
More informationComputational Workflows for Genome-Wide Association Study: I
Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases
More informationPopulation structure, heritability, and polygenic risk
Population structure, heritability, and polygenic risk Alicia Martin Daly Lab October 18, 2016 armartin@broadinstitute.org @genetisaur Project goals Call local ancestry in large case/control PTSD cohort
More informationCrash-course in genomics
Crash-course in genomics Molecular biology : How does the genome code for function? Genetics: How is the genome passed on from parent to child? Genetic variation: How does the genome change when it is
More informationEvidence of selection on human stature inferred from spatial distribution of allele frequencies.
Evidence of selection on human stature inferred from spatial distribution of allele frequencies. 1 Davide Piffer Abstract Spatial patterns of allele frequencies reveal a clear signal of natural (or sexual)
More informationEfficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation
Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation Asif Javed 1,2, Petros Drineas 2, Michael W. Mahoney 3 and Peristera Paschou 4 1 Computational Biology Center, IBM T. J. Watson
More informationHuman Genetics and Gene Mapping of Complex Traits
Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2018 Human Genetics Series Thursday 4/5/18 Nancy L. Saccone, Ph.D. Dept of Genetics nlims@genetics.wustl.edu / 314-747-3263 What
More informationCONTRACTING ORGANIZATION: Icahn School of Medicine at Mount Sinai New York, NY 10029
AWARD NUMBER: W81XWH-14-1-0399 TITLE: Molecular & Genetic Investigation of Tau in Chronic Traumatic Encephalopathy (Log No. 13267017) PRINCIPAL INVESTIGATOR: John F. Crary, MD-PhD CONTRACTING ORGANIZATION:
More informationARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms
ARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms Catarina D. Campbell, 1 Nick Sampas, 2 Anya Tsalenko, 2 Peter H. Sudmant, 1 Jeffrey M. Kidd, 1,3 Maika Malig, 1 Tiffany
More informationGenome-Wide Associa/on Studies: History, Current Approaches, and Future Opportuni/es. Addie Thompson Genomics,
Genome-Wide Associa/on Studies: History, Current Approaches, and Future Opportuni/es Addie Thompson Genomics, 11-15-2016 Outline History and terminology Sta5s5cs and breeding Linkage and associa5on analysis,
More informationSupplementary Figures
Supplementary Figures 1 Supplementary Figure 1. Analyses of present-day population differentiation. (A, B) Enrichment of strongly differentiated genic alleles for all present-day population comparisons
More informationLecture 2: Population Structure Advanced Topics in Computa8onal Genomics
Lecture 2: Population Structure 02-715 Advanced Topics in Computa8onal Genomics 1 What is population structure? Popula8on Structure A set of individuals characterized by some measure of gene8c dis8nc8on
More informationPersonal Genomics Platform White Paper Last Updated November 15, Executive Summary
Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,
More informationEvaluation of a multipoint method for imputing genotypes using HapMap III
Mathematical Statistics Stockholm University Evaluation of a multipoint method for imputing genotypes using HapMap III Emil Rehnberg Examensarbete 2009:5 Postal address: Mathematical Statistics Dept. of
More informationGenetic data concepts and tests
Genetic data concepts and tests Cavan Reilly September 21, 2018 Table of contents Overview Linkage disequilibrium Quantifying LD Heatmap for LD Hardy-Weinberg equilibrium Genotyping errors Population substructure
More informationIntroduction to Quantitative Genomics / Genetics
Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current
More informationImproving the accuracy and efficiency of identity by descent detection in population
Genetics: Early Online, published on March 27, 2013 as 10.1534/genetics.113.150029 Improving the accuracy and efficiency of identity by descent detection in population data Brian L. Browning *,1 and Sharon
More information