Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure

Similar documents

Bioinformatics opportunities in Genomics and Genetics

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Workshop on Data Science in Biomedicine

Introduction to Quantitative Genomics / Genetics

SNPTransformer: A Lightweight Toolkit for Genome-Wide Association Studies

LD Mapping and the Coalescent

Crash-course in genomics

Familial Breast Cancer

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

Computational Workflows for Genome-Wide Association Study: I

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY

Data Mining and Applications in Genomics

Prostate Cancer Genetics: Today and tomorrow

Axiom Biobank Genotyping Solution

Statistical Methods for Network Analysis of Biological Data

CMSC423: Bioinformatic Algorithms, Databases and Tools. Some Genetics

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work

Course Announcements

Detecting gene gene interactions that underlie human diseases

Detecting gene gene interactions that underlie human diseases

SNPs - GWAS - eqtls. Sebastian Schmeier

Observing Patterns In Inherited Traits

Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Aggregation of experts: an application in the field of interactomics (detection of interactions on the basis of genomic data)

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

Title: Powerful SNP Set Analysis for Case-Control Genome Wide Association Studies. Running Title: Powerful SNP Set Analysis. Hill, NC. MD.

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

Whole Genome Sequencing. Biostatistics 666

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Genome-wide association studies (GWAS) Part 1

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Exploring the Genetic Basis of Congenital Heart Defects

1. why study multiple traits together?

Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

FFGWAS. Fast Functional Genome Wide Association AnalysiS of Surface-based Imaging Genetic Data

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies

Using the Association Workflow in Partek Genomics Suite

Genome-Wide Association Studies (GWAS): Computational Them

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

BTRY 7210: Topics in Quantitative Genomics and Genetics

Haplotype Association Mapping by Density-Based Clustering in Case-Control Studies (Work-in-Progress)

H3A - Genome-Wide Association testing SOP

Using Genomics to Guide Immunosuppression Therapy David A. Baran, MD, FACC, FSCAI System Director, Advanced HF, Transplant and MCS, Sentara Heart

Block-based Bayesian epistasis association mapping with application to WTCCC type 1 diabetes data

SNP calling and VCF format

Genomic Selection with Linear Models and Rank Aggregation

PLINK gplink Haploview

Overview. Methods for gene mapping and haplotype analysis. Haplotypes. Outline. acatactacataacatacaatagat. aaatactacctaacctacaagagat

Beyond single genes or proteins

Population stratification. Background & PLINK practical

Understanding Variation in Human Skin Color

Association studies (Linkage disequilibrium)

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

BTRY 7210: Topics in Quantitative Genomics and Genetics

Trudy F C Mackay, Department of Genetics, North Carolina State University, Raleigh NC , USA.

Lab 1: A review of linear models

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Polygenic Influences on Boys & Girls Pubertal Timing & Tempo. Gregor Horvath, Valerie Knopik, Kristine Marceau Purdue University

BIOSTATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH. Genetic Variation & CANCERS

Principal Component Analysis in Genomic Data

SNP Selection. Outline of Tutorial. Why Do We Need tagsnps? Concepts of tagsnps. LD and haplotype definitions. Haplotype blocks and definitions

Computational Haplotype Analysis: An overview of computational methods in genetic variation study

Genetics Effective Use of New and Existing Methods

GREG GIBSON SPENCER V. MUSE

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Supplementary Figures

CUMACH - A Fast GPU-based Genotype Imputation Tool. Agatha Hu

1b. How do people differ genetically?

A FAST ALGORITHM FOR DETECTING GENE GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Understanding genetic association studies. Peter Kamerman

Computational Genomics

UNDERSTANDING VARIATION IN HUMAN SKIN COLOR. Student Handout

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

QTL Mapping, MAS, and Genomic Selection

Multiple Traits & Microarrays

Goal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

1 why study multiple traits together?

Frumkin, 2e Part 1: Methods and Paradigms. Chapter 6: Genetics and Environmental Health

I.1 The Principle: Identification and Application of Molecular Markers

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

Benno Pütz. MPI of Psychiatry

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

A clustering approach to identify rare variants associated with hypertension

Spatially Uniform ReliefF: Increasing the Power to Detect Epistasis in Genetic Association Studies

Comparing a few SNP calling algorithms using low-coverage sequencing data

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Genomic Selection in Breeding Programs BIOL 509 November 26, 2013

I See Dead People: Gene Mapping Via Ancestral Inference

Transcription:

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Sui-Pi Chen and Guan-Hua Huang Institute of Statistics National Chiao Tung University Hsinchu, Taiwan B:ghuang@stat.nctu.edu.tw 2012.11.20 1 / 21

Motivation Motivation Cultural factors Common environment Individual environment Polygenic background 2 / 21

Motivation Single nucleotide polymorphism (SNP) A DNA sequence variation Two alleles: A and a Treating SNPs as categorical features that have three possible values: AA, Aa, aa. Relabel AA (2),Aa (1),aa (0). 3 / 21

Motivation What is the gene gene interaction (epistasis)? The effects of a given gene on a biological trait are masked or enhanced by one or more genes. As increasing body of evidence has suggested that epistasis play an important role in susceptibility to human complex disease, such as Type 1 diabetes, breast cancer, obesity, and schizophrenia. More evidences have confirmed that display interaction effects without displaying marginal effect. When analyzing thousands and thousands genes from high-throughput SNP arrays, this can further complicate the problem due to computational burden. 4 / 21

Methods for detecting gene-gene interaction Methods for detecting gene-gene interaction Traditional method Bayesian model selection epistasis Two-stage methods Datamining 5 / 21

Methods for detecting gene-gene interaction Methods for detecting gene-gene interaction Traditional method Two-stage method Data-mining method Bayesian model selection Logistic regression, contingency table χ 2 test It dose not include the interaction terms without main effect. High-dimensional data that has high-order interactions, the contingency table have many empty cells. A subset of loci that pass some single-locus significance threshold is chosen as the filtered subset. An exhaustive search of all two-locus or higher-order interactions is carried out an the filtered subset. Nonparametic Not doing an exhaustive search Multifactor Dimensionality Reduction (MDR) Bayesian epistasis association mapping (BEAM) Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE) 6 / 21

Methods for detecting gene-gene interaction BEAM BEAM algorithm BEAM (Zhang and Liu, 2007) algorithm case-control study Metropolis-Hasting algorithm posterior probabilities - each SNP not associated with the disease - each SNP associated with the disease - each SNP involved with other SNPs in epistasis B statistic each SNP or set of SNPs for significant association asymptotically distributed as a shifted χ 2 with 3 k 1 degrees of freedom 7 / 21

Methods for detecting gene-gene interaction BEAM BEAM algorithm I = (I 1,, I L ) indicator the membership of the SNPs with I j = 0, 1, 2. BEAM found no significant interactions associated in the AMD data. Disease 8 / 21

Proposed method: ABCDE Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE) Disease Disease Independent effect Independent effect Independent effect (a) BEAM (b) ABCDE 9 / 21

Proposed method: ABCDE ABCDE algorithm ABCDE algorithm bayesian clustering approach case-control study Gibbs weighted Chinese restaurant (GWCR) procedure posterior probabilities - each SNPs is associated with the disease - clustered SNPs is associated with the disease. Permutation test for candidate disease subset selected by ABCDE 10-fold cross validation the heart of MDR approach: dimensional reduction. 10 / 21

Proposed method: ABCDE Model Product partition model 11 / 21

Simulation Simulation To evaluate the performance of ABCDE, we simulated data from 10 different models. Single-set models (models 1-5) Multiple-set models (models 6-8) LD-extend models (models 9-10) Comparison between ABCDE and BEAM. 12 / 21

Simulation Single-set models Model 1 disease Model 2 disease Model 3 disease 1 2 1,2 1,2 Model 4 disease Model 5 disease 1,2,3 1,2,3,4,5,6 13 / 21

Simulation Result for Single-set models power 0.0 0.2 0.4 0.6 0.8 1.0 ABCDE BEAM Model 1 0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0 power 0.0 0.2 0.4 0.6 0.8 1.0 Model 4 0.05 0.1 0.2 0.5 MAF Model 2 0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0 power 0.0 0.2 0.4 0.6 0.8 1.0 Model 5 0.05 0.1 0.2 0.5 MAF Model 3 0.05 0.1 0.2 0.5 MAF 14 / 21

Simulation Multiple-set models and LD-extend models Model 6 disease Model 7 disease Model 8 disease 1,2 3,4 1,2 3,4,5 1,2,3 4 5 Model 9 disease Model 10 disease 1,2 3,4 5 6 1,2 3,4 5 6 7 8 15 / 21

Simulation Result for Multiple-set models and LD-extend models power 0.0 0.2 0.4 0.6 0.8 1.0 Model 6 0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0 power 0.0 0.2 0.4 0.6 0.8 1.0 Model 9 0.05 0.1 0.2 0.5 MAF Model 7 0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0 power 0.0 0.2 0.4 0.6 0.8 1.0 Model 10 0.05 0.1 0.2 0.5 MAF Model 8 0.05 0.1 0.2 0.5 MAF 16 / 21

Real data Real data Detect pairwise and/or higher-order SNP interactions and understand the genetic architecture of schizophrenia through ABCDE and BEAM. 1512 individuals, including 912 schizophrenia cases and 600 controls. Gene Chr number DISC1 1q 16 LMBRD1 6q 11 DPYSL2 8p 14 TRIM35 8p 10 PTK2B 8p 19 NRG1 8p 10 DAO 12q 5 G72 13q 5 RASD2 22q 4 CACNG2 22q 6 17 / 21

Real data Flow chart-quality Control 1512 samples (912 cases, 600 controls) 100 SNPs (10 genes) Quality control <Haploview> Exclusion criterion of samples -individual with GCR<70% 1509 samples (909 cases, 600 controls) 95 SNPs (10 genes) Exclusion criterion of SNPs -HWp-value<0.0001 -GCR<75% -MAF<0.005 18 / 21

Real data Result Table: Identified significant epistatic sets by BEAM using all 95 SNPs. SNP Chr. Gene B-statistic(p-value) BA(p-value) PA(p-value) rsdisc1p-3 1q DISC1 55.19(9.89 10 11 ) 0.5944(0) 0.5557(0.018) rsdisc1-23 1q DISC1 31.31(1.51 10 5 ) 0.5705(0) 0.5416(0.224) rsdpysl-4 8p DPYSL 21.26(0.002) 0.5561(0) 0.5156(0.399) rstrim35-5 8p TRIM 32.23(9.52 10 6 ) 0.5693(0) 0.5296(0.386) rsnrg1p-7 8p NRG1 59.88(9.44 10 12 ) 0.5996(0) 0.5815(0.024) rsg72-e-2 13q G72 43.16(4.03 10 8 ) 0.5839(0) 0.5695(0.029) 19 / 21

Real data Result Table: Identified significant epistatic sets by ABCDE using all 95 SNPs. SNPs Chr. Gene B-statistic(p-value) BA(p-value) PA(p-value) rsdpysl-15,rssdpysl2-11 8p DPYSL 58.48(4 10 6 ) 0.5304(0.01) 0.5933(0.005) rsstrim35-1,rstrim35-2,rstrim35-5 8p TRIM35 127.97(0) 0.5647(0) 0.5146(0.412) rssdpysl2-1,rsdpysl-3,rsdpysl-4 8p DPYSL2 81.63(0.016) 0.5678(0) 0.6619(0) rsdao-6,rsdao-7,rsdao-8 12q DAO 216.99(0) 0.582(0) 0.6531(0) rsg72-e-1,rsg72-e-2,rsg72-13 13q G72 91.00(5.32 10 4 ) 0.5866(0) 0.575(0.006) rssdisc1-1,rsdisc1p-3, 1q DISC1 251.41(0) 0.6325(0) 0.6178(0) rsdisc1-23,rsdisc1-27 rssdpysl2-1,rsdpysl-3, 8p DPYSL2 197.15(2.3 10 5 ) 0.5686(0) 0.6185(0) rsdpysl-4,rssdpysl2-5 rsnrg1p-6,rsnrg1p-7, (8p, 22q) NRG1, 86.96(1) 0.5962(0) 0.5642(0.05) rscacng2-16,rscacng2-15 CACNG2 rsstrim35-1,rstrim35-2,rstrim35-4, 8p TRIM35 354.85(1) 0.572(0) 0.5255(0.403) rstrim35-5,rstrim35-6 rsdao-6,rsdao-7,rsdao-8 (12q,22q) DAO, 171.62(1) 0.5737(0) 0.6137(0) rscacng2-2,rscacng2p-1, CACNG2 rscncng2-18 20 / 21

Conclusion Conclusion We propose the ABCDE algorithm which can character all explicit (interaction) effects, regardless of the number of groups. We further develop permutation tests to validate the disease association of SNP subsets selected by ABCDE. Applying ABCDE to the real data, we identify several known and novel schizophrenia-associated SNPs and sets of SNPs. We may develop a parallel implementation of the ABCDE, which is the algorithm for large scale epistatic interaction mapping, including genome-wide studies with hundreds of thousands of markers. 21 / 21