Advanced Introduction to Machine Learning

Size: px
Start display at page:

Download "Advanced Introduction to Machine Learning"

Transcription

1 Advanced Introduction to Machine Learning 10715, Fall 2014 Structured Sparsity, with application in Computational Genomics Eric Xing Lecture 3, September 15, 2014 Reading: Eric CMU,

2 Structured Sparsity =argminl(x; Y; )+Ð( ) Sparsity Ð( ) = P i j ij Group sparsity y n-1 x 1 x 2 x 3 Ð( ) = P c j G c j 2 = P c q P i2g c 2i n x n-1 Total variation sparsity Ð( ) = P c j G c j TV = P c P i2g c j i i 1 j x n Eric CMU,

3 Genetic Basis of Diseases ACTCGTACGTAGACCTAGCATTACGCAATAATGCGA ACTCGAACCTAGACCTAGCATTACGCAATAATGCGA Healthy TCTCGTACGTAGACGTAGCATTACGCAATTATCCGA ACTCGAACCTAGACCTAGCATTACGCAATTATCCGA ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA TCTCGTACCTAGACGTAGCATAACGCAATAATCCGA ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA Sick Single nucleotide polymorphism (SNP) Causal (or "associated") SNP Eric CMU,

4 Genetic Association Mapping Data Genotype Phenotype Standard Approach A.... T.. G..... C T..... A.. G. A.... A.. C..... C T..... A.. G. causal SNP T.... T.. G..... G T..... T.. C. A.... A.. C..... C T..... T.. C. A.... T.. G..... G A..... A.. G. T.... T.. C..... G A..... A.. C. A.... A.. C..... C A..... T.. C. a univariate phenotype: e.g., disease/control Cancer: Dunning et al Diabetes: Dupuis et al Atopic dermatitis: Esparza-Gordillo et al Arthritis: Suzuki et al Eric CMU,

5 Genetic Basis of Complex Diseases ACTCGTACGTAGACCTAGCATTACGCAATAATGCGA ACTCGAACCTAGACCTAGCATTACGCAATAATGCGA Healthy TCTCGTACGTAGACGTAGCATTACGCAATTATCCGA ACTCGAACCTAGACCTAGCATTACGCAATTATCCGA ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA TCTCGTACCTAGACGTAGCATAACGCAATAATCCGA ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA Cancer Causal SNPs Eric CMU,

6 Genetic Basis of Complex Diseases ACTCGTACGTAGACCTAGCATTACGCAATAATGCGA Healthy ACTCGAACCTAGACCTAGCATTACGCAATAATGCGA TCTCGTACGTAGACGTAGCATTACGCAATTATCCGA ACTCGAACCTAGACCTAGCATTACGCAATTATCCGA Biological mechanism ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA TCTCGTACCTAGACGTAGCATAACGCAATAATCCGA ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA Cancer Causal SNPs Eric CMU,

7 Genetic Basis of Complex Diseases Association to intermediate phenotypes Intermediate Phenotype Gene expression ACTCGTACGTAGACCTAGCATTACGCAATAATGCGA Healthy ACTCGAACCTAGACCTAGCATTACGCAATAATGCGA TCTCGTACGTAGACGTAGCATTACGCAATTATCCGA ACTCGAACCTAGACCTAGCATTACGCAATTATCCGA ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA TCTCGTACCTAGACGTAGCATAACGCAATAATCCGA ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA Causal SNPs Clinical records Cancer Eric CMU,

8 Genetic Association for Asthma Clinical Traits TCGACGTTTTACTGTACAATT Subnetworks for lung physiology Statistical challenge How to find associations to a multivariate entity in a graph? Subnetwork for quality of life Eric CMU,

9 Gene Expression Trait Analysis TCGACGTTTTACTGTACAATT Samples Genes Statistical challenge How to find associations to a multivariate entity in a tree? Eric CMU,

10 Structured Association Traditional Approach causal SNP ACGTTTTACTGTACAATT Association with Phenome ACGTTTTACTGTACAATT a univariate phenotype: gene expression level Multivariate complex syndrome (e.g., asthma) age at onset, history of eczema genome wide expression profile Eric CMU,

11 Sparse Associations Pleotropic effects Epistatic effects Eric CMU,

12 Structured Sparse Association : a New Paradigm Standard Approach Consider one phenotype at a time vs. New Approach Consider multiple correlated phenotypes (phenome) jointly Phenotypes Phenome Eric CMU,

13 Sparse Learning Linear Model: Lasso (Sparse Linear Regression) [R.Tibshirani 96] Why sparse solution? penalizing constraining Eric CMU,

14 Multi-Task Extension Multi-Task Linear Model: Input: Output: Coefficients for k-th task: Coefficient Matrix: Coefficients for a variable (2 nd ) Coefficients for a task (2 nd ) Eric CMU,

15 Outline Background: Sparse multivariate regression for disease association studies Structured association a new paradigm Association to a graph-structured phenome Graph-guided fused lasso (Kim & Xing, PLoS Genetics, 2009) Association to a tree-structured phenome Tree-guided group lasso (Kim & Xing, ICML 2010) Eric CMU,

16 Multivariate Regression for Single- Trait Association Analysis Trait Genotype T G A A C C A T G A A G T A 2.1 x = Association Strength? y = X x β Eric CMU,

17 Multivariate Regression for Single- Trait Association Analysis Trait Genotype T G A A C C A T G A A G T A 2.1 x = Association Strength Many non-zero associations: Which SNPs are truly significant? Eric CMU,

18 Lasso for Reducing False Positives (Tibshirani, 1996) Trait Genotype Association Strength 2.1 = T G A A C C A T G A A G T A x Lasso Penalty for sparsity J + β j j 1 Many zero associations (sparse results), but what if there are multiple related traits? Eric CMU,

19 Multivariate Regression for Multiple- Trait Association Analysis Genotype Association Strength (3.4, 1.5, 2.1, 0.9, 1.8) Allergy Lung physiology = T G A A C C A T G A A G T A LD Synthetic lethal x Association strength between SNP j and Trait i: β j,i + β j,i i, j How to combine information across multiple traits to increase the power? Eric CMU,

20 Multivariate Regression for Multiple- Trait Association Analysis Trait Genotype Association Strength (3.4, 1.5, 2.1, 0.9, 1.8) Allergy Lung physiology = T G A A C C A T G A A G T A x Association strength between SNP j and Trait i: β j,i + β j,i i, j + We introduce graph-guided fusion penalty Eric CMU,

21 Multiple-trait Association: Graph-Constrained Fused Lasso Step 1: Thresholded correlation graph of phenotypes Step 2: Graph constrained fused lasso ACGTTTTACTGTACAATT Fusion Lasso Penalty Graph-constrained fusion penalty Eric CMU,

22 Fusion Penalty SNP j ACGTTTTACTGTACAATT Association strength between SNP j and Trait k: β jk Association strength between SNP j and Trait m: β jm Trait m Trait k Fusion Penalty: β jk - β jm For two correlated traits (connected in the network), the association strengths may have similar values. Eric CMU,

23 Graph-Constrained Fused Lasso Overall effect ACGTTTTACTGTACAATT Fusion effect propagates to the entire network Association between SNPs and subnetworks of traits Eric CMU,

24 Multiple-trait Association: Graph-Weighted Fused Lasso Overall effect ACGTTTTACTGTACAATT Subnetwork structure is embedded as a densely connected nodes with large edge weights Edges with small weights are effectively ignored Eric CMU,

25 Estimating Parameters Quadratic programming formulation Graph-constrained fused lasso Graph-weighted fused lasso Many publicly available software packages for solving convex optimization problems can be used Eric CMU,

26 Simulation Results 50 SNPs taken from HapMap chromosome 7, CEU population 10 traits SNPs Trait Thresholded Trait Correlation Correlation Network Matrix Phenotypes High association No association True Regression Coefficients Single SNP-Single Trait Test Significant at α = 0.01 Lasso Graph-guided Fused Lasso Eric CMU,

27 Simulation Results Eric CMU,

28 Association to a Tree-structured Phenome TCGACGTTTTACTGTACAATT Eric CMU,

29 Tree-Guided Group Lasso In a simple case of two genes h h Genotypes Low height Tight correlation Joint selection Genotypes Large height Weak correlation Separate selection Eric CMU,

30 Tree-Guided Group Lasso In a simple case of two genes h Select the child nodes jointly or separately? Tree-guided group lasso L 1 penalty Lasso penalty Separate selection Elastic net L 2 penalty Group lasso Joint selection Eric CMU,

31 Tree-Guided Group Lasso For a general tree h 2 h 1 Select the child nodes jointly or separately? Tree-guided group lasso Joint selection Separate selection Eric CMU,

32 Tree-Guided Group Lasso For a general tree h 2 h 1 Select the child nodes jointly or separately? Tree-guided group lasso Joint selection Separate selection Eric CMU,

33 Balanced Shrinkage Previously, in Jenatton, Audibert & Bach, 2009 Eric CMU,

34 Estimating Parameters Second-order cone program Many publicly available software packages for solving convex optimization problems can be used Eric CMU,

35 Illustration with Simulated Data Phenotypes SNPs High association True association strengths Lasso Tree guided group lasso No association Eric CMU,

36 Yeast eqtl Analysis Hierarchical clustering tree Phenotypes SNPs High association No association Single Marker Single Trait Test Tree guided group lasso Eric CMU,

37 Ultimately Pleotropic effects Epistatic effects Eric CMU,

38 Structured Input/Output-Lasso io lasso arg min Output Lasso Input structure: penalty: group within selection group sparsity of of SNPs correlated across multiple epistatic correlated SNPs traits [Lee, Zhu and Xing, submitted 2010] K N p k k Yi j X ij k 1 i 1 j 1 ( r, s) U Z k rs i, rs ( r. s) U j 1 k 1 k m ( r, s) S This full model incorporates input/output structure of the dataset as well as epistatic effects guided by genetic interaction networks U S m : m j k k k 2 j k rs m k j k 2 rs : genetic interaction networks th cluster in SNP network Eric CMU,

39 Sensitivity and Specificity varying the number of SNPs Marginal SNP: Methods taking advantage of output structures outperforms others. Epistatic SNP: Methods taking advantage of input structures outperforms others. IO-Lasso outperforms other methods for detecting both marginal & epsitatic eqtls For each number of SNPs, we show the average of the performance with 5 different simulated data Eric CMU, /15/

40 Computation Time Eric CMU,

41 Proximal Gradient Descent Original Problem: Approximatio n Problem: Gradient of the Approximation : Eric CMU,

42 Geometric Interpretation Smooth approximation Uppermost Line Nonsmooth Uppermost Line Smooth Eric CMU,

43 Convergence Rate Theorem: If we require and set, the number of iterations is upper bounded by: Remarks: state of the art IPM method for for SOCP converges at a rate Eric CMU,

44 Multi-Task Time Complexity Pre-compute: Per-iteration Complexity (computing gradient) Tree: IPM for SOCP Proximal-Gradient Graph: IPM for SOCP Proximal-Gradient Proximal-Gradient: Independent of Sample Size Eric CMU, Linear 2014 in #.of Tasks 44

45 Experiments Multi-task Graph Structured Sparse Learning (GFlasso) Eric CMU,

46 Experiments Multi-task Tree-Structured Sparse Learning (TreeLasso) Eric CMU,

47 Conclusions Novel statistical methods for joint association analysis to correlated phenotypes Graph-structured phenome : graph-guided fused lasso Tree-structured phenome : tree-guided group lasso Advantages Greater power to detect weak association signals Fewer false positives Joint association to multiple correlated phenotypes Eric CMU,

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Graph-induced structured input/output models - Case Study: Disease Association Analysis Eric Xing Lecture 25, April 16, 2014 Reading: See class

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Graph-induced structured input/output models - Case Study: Disease Association Analysis Eric Xing Lecture 23, April 6, 2016 Reading: See class

More information

Statistical Methods for Network Analysis of Biological Data

Statistical Methods for Network Analysis of Biological Data The Protein Interaction Workshop, 8 12 June 2015, IMS Statistical Methods for Network Analysis of Biological Data Minghua Deng, dengmh@pku.edu.cn School of Mathematical Sciences Center for Quantitative

More information

Abstract. Introduction

Abstract. Introduction 1 GWAS in a Box: Statistical and Visual Analytics of Structured Associations via GenAMap Eric P Xing 1,, Ross E Curtis 2,4, Georg Schoenherr 3,, Seunghak Lee 3, Junming Yin 4, Kriti Puniyani 6, Wei Wu

More information

Data Mining and Applications in Genomics

Data Mining and Applications in Genomics Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications

More information

Modeling & Simulation in pharmacogenetics/personalised medicine

Modeling & Simulation in pharmacogenetics/personalised medicine Modeling & Simulation in pharmacogenetics/personalised medicine Julie Bertrand MRC research fellow UCL Genetics Institute 07 September, 2012 jbertrand@uclacuk WCOP 07/09/12 1 / 20 Pharmacogenetics Study

More information

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer Multi-SNP Models for Fine-Mapping Studies: Application to an association study of the Kallikrein Region and Prostate Cancer November 11, 2014 Contents Background 1 Background 2 3 4 5 6 Study Motivation

More information

Course Announcements

Course Announcements Statistical Methods for Quantitative Trait Loci (QTL) Mapping II Lectures 5 Oct 2, 2 SE 527 omputational Biology, Fall 2 Instructor Su-In Lee T hristopher Miles Monday & Wednesday 2-2 Johnson Hall (JHN)

More information

Workshop on Data Science in Biomedicine

Workshop on Data Science in Biomedicine Workshop on Data Science in Biomedicine July 6 Room 1217, Department of Mathematics, Hong Kong Baptist University 09:30-09:40 Welcoming Remarks 9:40-10:20 Pak Chung Sham, Centre for Genomic Sciences, The

More information

Identifying Genetic Associations with MRI-derived Measures via Tree-Guided Sparse Learning

Identifying Genetic Associations with MRI-derived Measures via Tree-Guided Sparse Learning Identifying Genetic Associations with MRI-derived Measures via Tree-Guided Sparse Learning Xiaoke Hao 1, Jintai Yu 2, Daoqiang Zhang 1,* 1 Department of Computer Science and Engineering, Nanjing University

More information

http://genemapping.org/ Epistasis in Association Studies David Evans Law of Independent Assortment Biological Epistasis Bateson (99) a masking effect whereby a variant or allele at one locus prevents

More information

Including prior knowledge in shrinkage classifiers for genomic data

Including prior knowledge in shrinkage classifiers for genomic data Including prior knowledge in shrinkage classifiers for genomic data Jean-Philippe Vert Jean-Philippe.Vert@mines-paristech.fr Mines ParisTech / Curie Institute / Inserm Statistical Genomics in Biomedical

More information

Evaluation of random forest regression for prediction of breeding value from genomewide SNPs

Evaluation of random forest regression for prediction of breeding value from genomewide SNPs c Indian Academy of Sciences RESEARCH ARTICLE Evaluation of random forest regression for prediction of breeding value from genomewide SNPs RUPAM KUMAR SARKAR 1,A.R.RAO 1, PRABINA KUMAR MEHER 1, T. NEPOLEAN

More information

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by author] Statistical methods: All hypothesis tests were conducted using two-sided P-values

More information

Using Visualization and Automation to Accelerate Genetics Discovery

Using Visualization and Automation to Accelerate Genetics Discovery Using Visualization and Automation to Accelerate Genetics Discovery Ross Eugene Curtis December 2011 CMU-CB-11-103 Lane Center for Computational Biology School of Computer Science Carnegie Mellon University

More information

Authors: Yumin Xiao. Supervisor: Xia Shen

Authors: Yumin Xiao. Supervisor: Xia Shen Incorporating gene annotation information into fine-mapping quantitative trait loci in genome-wide association studies: a hierarchical generalized linear model approach Authors: Yumin Xiao Supervisor:

More information

Computational Genomics

Computational Genomics Computational Genomics 10-810/02 810/02-710, Spring 2009 Quantitative Trait Locus (QTL) Mapping Eric Xing Lecture 23, April 13, 2009 Reading: DTW book, Chap 13 Eric Xing @ CMU, 2005-2009 1 Phenotypical

More information

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies p. 1/20 Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies David J. Balding Centre

More information

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Haiyan Huang Department of Statistics, UC Berkeley Feb 7, 2018 Background Background High dimensionality (p >> n) often results

More information

FINDING GENOME-TRANSCRIPTOME-PHENOME ASSOCIATION WITH STRUCTURED ASSOCIATION MAPPING AND VISUALIZATION IN GENAMAP

FINDING GENOME-TRANSCRIPTOME-PHENOME ASSOCIATION WITH STRUCTURED ASSOCIATION MAPPING AND VISUALIZATION IN GENAMAP FINDING GENOME-TRANSCRIPTOME-PHENOME ASSOCIATION WITH STRUCTURED ASSOCIATION MAPPING AND VISUALIZATION IN GENAMAP ROSS E CURTIS 1,2, JUNMING YIN 2, PETER KINNAIRD 3, AND ERIC P XING 4 1 Joint Carnegie

More information

Genomic Selection with Linear Models and Rank Aggregation

Genomic Selection with Linear Models and Rank Aggregation Genomic Selection with Linear Models and Rank Aggregation m.scutari@ucl.ac.uk Genetics Institute March 5th, 2012 Genomic Selection Genomic Selection Genomic Selection: an Overview Genomic selection (GS)

More information

Bayesian Variable Selection and Data Integration for Biological Regulatory Networks

Bayesian Variable Selection and Data Integration for Biological Regulatory Networks Bayesian Variable Selection and Data Integration for Biological Regulatory Networks Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu Gary

More information

Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information

Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information Ohad Manor 1,2, Eran Segal 1,2 * 1 Department of Computer Science and Applied Mathematics, Weizmann Institute

More information

arxiv: v1 [stat.ap] 31 Jul 2014

arxiv: v1 [stat.ap] 31 Jul 2014 Fast Genome-Wide QTL Analysis Using MENDEL arxiv:1407.8259v1 [stat.ap] 31 Jul 2014 Hua Zhou Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Email: hua_zhou@ncsu.edu Tao

More information

Haplotype Based Association Tests. Biostatistics 666 Lecture 10

Haplotype Based Association Tests. Biostatistics 666 Lecture 10 Haplotype Based Association Tests Biostatistics 666 Lecture 10 Last Lecture Statistical Haplotyping Methods Clark s greedy algorithm The E-M algorithm Stephens et al. coalescent-based algorithm Hypothesis

More information

Genotype Prediction with SVMs

Genotype Prediction with SVMs Genotype Prediction with SVMs Nicholas Johnson December 12, 2008 1 Summary A tuned SVM appears competitive with the FastPhase HMM (Stephens and Scheet, 2006), which is the current state of the art in genotype

More information

FFGWAS. Fast Functional Genome Wide Association AnalysiS of Surface-based Imaging Genetic Data

FFGWAS. Fast Functional Genome Wide Association AnalysiS of Surface-based Imaging Genetic Data FFGWAS Fast Functional Genome Wide Association AnalysiS of Surface-based Imaging Genetic Data Chao Huang Department of Biostatistics Biomedical Research Imaging Center The University of North Carolina

More information

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs (3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable

More information

Genome-wide association studies with high-dimensional phenotypes

Genome-wide association studies with high-dimensional phenotypes Genome-wide association studies with high-dimensional phenotypes Pekka Marttinen 1, Jussi Gillberg 1, Aki Havulinna 2, Jukka Corander 3, and Samuel Kaski 1,4 1 Helsinki Institute for Information Technology

More information

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology

More information

Brown University. Linear Methods for SNP Selection

Brown University. Linear Methods for SNP Selection Brown University Master s Project Report Linear Methods for SNP Selection Author: Alexander Stuart Gillmor Advisor: Sorin Istrail A report submitted in partial fulfilment of the requirements for the degree

More information

Performance of a blockwise approach in variable selection using linkage disequilibrium information

Performance of a blockwise approach in variable selection using linkage disequilibrium information Dehman et al. BMC Bioinformatics (2015) 16:148 DOI 10.1186/s12859-015-0556-6 RESEARCH ARTICLE Open Access Performance of a blockwise approach in variable selection using linkage disequilibrium information

More information

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get

More information

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Kevin Galinsky Harvard T. H. Chan School of Public Health American Society

More information

Applied Bioinformatics

Applied Bioinformatics Applied Bioinformatics In silico and In clinico characterization of genetic variations Assistant Professor Department of Biomedical Informatics Center for Human Genetics Research ATCAAAATTATGGAAGAA ATCAAAATCATGGAAGAA

More information

A FAST ALGORITHM FOR DETECTING GENE GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES

A FAST ALGORITHM FOR DETECTING GENE GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES The Annals of Applied Statistics 2014, Vol. 8, No. 4, 2292 2318 DOI: 10.1214/14-AOAS771 Institute of Mathematical Statistics, 2014 A FAST ALGORITHM FOR DETECTING GENE GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION

More information

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls BMI/CS 776 www.biostat.wisc.edu/bmi776/ Colin Dewey cdewey@biostat.wisc.edu Spring 2012 1. Understanding Human Genetic Variation

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Introduction to Add Health GWAS Data Part I Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Outline Introduction to genome-wide association studies (GWAS) Research

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

A Propagation-based Algorithm for Inferring Gene-Disease Associations

A Propagation-based Algorithm for Inferring Gene-Disease Associations A Propagation-based Algorithm for Inferring Gene-Disease Associations Oron Vanunu Roded Sharan Abstract: A fundamental challenge in human health is the identification of diseasecausing genes. Recently,

More information

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls BMI/CS 776 www.biostat.wisc.edu/bmi776/ Mark Craven craven@biostat.wisc.edu Spring 2011 1. Understanding Human Genetic Variation!

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

arxiv: v1 [stat.ap] 11 May 2016

arxiv: v1 [stat.ap] 11 May 2016 Multi-class Vector AutoRegressive Models for Multi-store Sales Data Ines Wilms, Luca Barbaglia, and Christophe Croux Faculty of Economics and Business, KU Leuven arxiv:1605.03325v1 [stat.ap] 11 May 2016

More information

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011 EPIB 668 Genetic association studies Aurélie LABBE - Winter 2011 1 / 71 OUTLINE Linkage vs association Linkage disequilibrium Case control studies Family-based association 2 / 71 RECAP ON GENETIC VARIANTS

More information

LD Mapping and the Coalescent

LD Mapping and the Coalescent Zhaojun Zhang zzj@cs.unc.edu April 2, 2009 Outline 1 Linkage Mapping 2 Linkage Disequilibrium Mapping 3 A role for coalescent 4 Prove existance of LD on simulated data Qualitiative measure Quantitiave

More information

Biomarker discovery and high dimensional datasets

Biomarker discovery and high dimensional datasets Biomarker discovery and high dimensional datasets Biomedical Data Science Marco Colombo Lecture 4, 2017/2018 High-dimensional medical data In recent years, the availability of high-dimensional biological

More information

Lecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics

Lecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics Lecture: Genetic Basis of Complex Phenotypes 02-715 Advanced Topics in Computa8onal Genomics Genome Polymorphisms A Human Genealogy TCGAGGTATTAAC The ancestral chromosome From SNPS TCGAGGTATTAAC TCTAGGTATTAAC

More information

Introduction to Quantitative Genomics / Genetics

Introduction to Quantitative Genomics / Genetics Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current

More information

Supplementary Note: Detecting population structure in rare variant data

Supplementary Note: Detecting population structure in rare variant data Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to

More information

Overview. Methods for gene mapping and haplotype analysis. Haplotypes. Outline. acatactacataacatacaatagat. aaatactacctaacctacaagagat

Overview. Methods for gene mapping and haplotype analysis. Haplotypes. Outline. acatactacataacatacaatagat. aaatactacctaacctacaagagat Overview Methods for gene mapping and haplotype analysis Prof. Hannu Toivonen hannu.toivonen@cs.helsinki.fi Discovery and utilization of patterns in the human genome Shared patterns family relationships,

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

Package trigger. R topics documented: August 16, Type Package

Package trigger. R topics documented: August 16, Type Package Type Package Package trigger August 16, 2018 Title Transcriptional Regulatory Inference from Genetics of Gene ExpRession Version 1.26.0 Author Lin S. Chen , Dipen P. Sangurdekar

More information

Computational Workflows for Genome-Wide Association Study: I

Computational Workflows for Genome-Wide Association Study: I Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases

More information

Machine Learning in Computational Biology CSC 2431

Machine Learning in Computational Biology CSC 2431 Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs

More information

GCTA/GREML. Rebecca Johnson. March 30th, 2017

GCTA/GREML. Rebecca Johnson. March 30th, 2017 GCTA/GREML Rebecca Johnson March 30th, 2017 1 / 12 Motivation for method We know from twin studies and other methods that genetic variation contributes to complex traits like height, BMI, educational attainment,

More information

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Sui-Pi Chen and Guan-Hua Huang Institute of Statistics National Chiao Tung University Hsinchu,

More information

Intro Logistic Regression Gradient Descent + SGD

Intro Logistic Regression Gradient Descent + SGD Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 29, 2016 1 Ad Placement

More information

General aspects of genome-wide association studies

General aspects of genome-wide association studies General aspects of genome-wide association studies Abstract number 20201 Session 04 Correctly reporting statistical genetics results in the genomic era Pekka Uimari University of Helsinki Dept. of Agricultural

More information

Course: IT and Health

Course: IT and Health From Genotype to Phenotype Reconstructing the past: The First Ancient Human Genome 2762 Course: IT and Health Kasper Nielsen Center for Biological Sequence Analysis Technical University of Denmark Outline

More information

Association studies (Linkage disequilibrium)

Association studies (Linkage disequilibrium) Positional cloning: statistical approaches to gene mapping, i.e. locating genes on the genome Linkage analysis Association studies (Linkage disequilibrium) Linkage analysis Uses a genetic marker map (a

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

Copyright Warning & Restrictions

Copyright Warning & Restrictions Copyright Warning & Restrictions The copyright law of the United States (Title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material. Under certain conditions

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY E. ZEGGINI and J.L. ASIMIT Wellcome Trust Sanger Institute, Hinxton, CB10 1HH,

More information

SNPs - GWAS - eqtls. Sebastian Schmeier

SNPs - GWAS - eqtls. Sebastian Schmeier SNPs - GWAS - eqtls s.schmeier@gmail.com http://sschmeier.github.io/bioinf-workshop/ 17.08.2015 Overview Single nucleotide polymorphism (refresh) SNPs effect on genes (refresh) Genome-wide association

More information

Why do we need statistics to study genetics and evolution?

Why do we need statistics to study genetics and evolution? Why do we need statistics to study genetics and evolution? 1. Mapping traits to the genome [Linkage maps (incl. QTLs), LOD] 2. Quantifying genetic basis of complex traits [Concordance, heritability] 3.

More information

A Bayesian Graphical Model for Genome-wide Association Studies (GWAS)

A Bayesian Graphical Model for Genome-wide Association Studies (GWAS) A Bayesian Graphical Model for Genome-wide Association Studies (GWAS) Laurent Briollais 1,2, Adrian Dobra 3, Jinnan Liu 1, Hilmi Ozcelik 1 and Hélène Massam 4 Prosserman Centre for Health Research, Samuel

More information

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations An Analytical Upper Bound on the Minimum Number of Recombinations in the History of SNP Sequences in Populations Yufeng Wu Department of Computer Science and Engineering University of Connecticut Storrs,

More information

High-density SNP Genotyping Analysis of Broiler Breeding Lines

High-density SNP Genotyping Analysis of Broiler Breeding Lines Animal Industry Report AS 653 ASL R2219 2007 High-density SNP Genotyping Analysis of Broiler Breeding Lines Abebe T. Hassen Jack C.M. Dekkers Susan J. Lamont Rohan L. Fernando Santiago Avendano Aviagen

More information

Understanding genetic association studies. Peter Kamerman

Understanding genetic association studies. Peter Kamerman Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -

More information

QTL Mapping, MAS, and Genomic Selection

QTL Mapping, MAS, and Genomic Selection QTL Mapping, MAS, and Genomic Selection Dr. Ben Hayes Department of Primary Industries Victoria, Australia A short-course organized by Animal Breeding & Genetics Department of Animal Science Iowa State

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu Spring 2015, Thurs.,12:20-1:10

More information

Axiom Biobank Genotyping Solution

Axiom Biobank Genotyping Solution TCCGGCAACTGTA AGTTACATCCAG G T ATCGGCATACCA C AGTTAATACCAG A Axiom Biobank Genotyping Solution The power of discovery is in the design GWAS has evolved why and how? More than 2,000 genetic loci have been

More information

Module 1 Principles of plant breeding

Module 1 Principles of plant breeding Covered topics, Distance Learning course Plant Breeding M1-M5 V2.0 Dr. Jan-Kees Goud, Wageningen University & Research The five main modules consist of the following content: Module 1 Principles of plant

More information

arxiv: v1 [stat.ap] 3 Feb 2015

arxiv: v1 [stat.ap] 3 Feb 2015 The Annals of Applied Statistics 2014, Vol. 8, No. 4, 2292 2318 DOI: 10.1214/14-AOAS771 c Institute of Mathematical Statistics, 2014 A FAST ALGORITHM FOR DETECTING GENE GENE INTERACTIONS IN GENOME-WIDE

More information

Nature Genetics: doi: /ng Supplementary Figure 1. QQ plots of P values from the SMR tests under a range of simulation scenarios.

Nature Genetics: doi: /ng Supplementary Figure 1. QQ plots of P values from the SMR tests under a range of simulation scenarios. Supplementary Figure 1 QQ plots of P values from the SMR tests under a range of simulation scenarios. The simulation strategy is described in the Supplementary Note. Shown are the results from 10,000 simulations.

More information

Stochastic Convex Sparse Principal Component Analysis

Stochastic Convex Sparse Principal Component Analysis Stochastic Convex Sparse Principal Component Analysis Inci M. Baytas 1, Kaixiang Lin 1, Fei Wang 2, Anil K. Jain 1 and Jiayu Zhou 1 1 Michigan State University 2 Cornell University 1 Principal Component

More information

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012 Lecture 23: Causes and Consequences of Linkage Disequilibrium November 16, 2012 Last Time Signatures of selection based on synonymous and nonsynonymous substitutions Multiple loci and independent segregation

More information

EAAP : 29/08-02/ A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T

EAAP : 29/08-02/ A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T Description of the French genomic Marker Assisted by Selection program P. Croiseau, C. Hoze, S. Fritz, F. Guillaume, C. Colombani, A. Legarra, A. Baur, C. Robert-Granié, D. Boichard and V. Ducrocq Use

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu January 29, 2015 Why you re here

More information

Genomic Selection in Breeding Programs BIOL 509 November 26, 2013

Genomic Selection in Breeding Programs BIOL 509 November 26, 2013 Genomic Selection in Breeding Programs BIOL 509 November 26, 2013 Omnia Ibrahim omniyagamal@yahoo.com 1 Outline 1- Definitions 2- Traditional breeding 3- Genomic selection (as tool of molecular breeding)

More information

Using Genomics to Guide Immunosuppression Therapy David A. Baran, MD, FACC, FSCAI System Director, Advanced HF, Transplant and MCS, Sentara Heart

Using Genomics to Guide Immunosuppression Therapy David A. Baran, MD, FACC, FSCAI System Director, Advanced HF, Transplant and MCS, Sentara Heart Using Genomics to Guide Immunosuppression Therapy David A. Baran, MD, FACC, FSCAI System Director, Advanced HF, Transplant and MCS, Sentara Heart Hospital, Norfolk, VA Disclosure Consulting: Livanova,

More information

Crash-course in genomics

Crash-course in genomics Crash-course in genomics Molecular biology : How does the genome code for function? Genetics: How is the genome passed on from parent to child? Genetic variation: How does the genome change when it is

More information

Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize

Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize Esperanza Shenstone and Alexander E. Lipka Department of Crop Sciences University of Illinois at

More information

Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants

Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants Melissa G. Naylor 1 *, Xihong Lin 1, Scott T. Weiss 2, Benjamin A. Raby 2, Christoph Lange 1,2 1 Department of Biostatistics,

More information

DNA Based Disease Prediction using pathway Analysis

DNA Based Disease Prediction using pathway Analysis 2017 IEEE 7th International Advance Computing Conference DNA Based Disease Prediction using pathway Analysis Syeeda Farah Dr.Asha T Cauvery B and Sushma M S Department of Computer Science and Shivanand

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of

More information

Bioinformatics opportunities in Genomics and Genetics

Bioinformatics opportunities in Genomics and Genetics Bioinformatics opportunities in Genomics and Genetics Case Study: Prediction of novel gene functions of NSF1/YPL230W in Saccharomyces Cerevisiae via search for maximally interconnected sub-graph Kyrylo

More information

/ Computational Genomics. Time series analysis

/ Computational Genomics. Time series analysis 10-810 /02-710 Computational Genomics Time series analysis Expression Experiments Static: Snapshot of the activity in the cell Time series: Multiple arrays at various temporal intervals Time Series Examples:

More information

A developmental template for evolutionary design

A developmental template for evolutionary design R. Stouffs, P. Janssen, S. Roudavski, B. Tunçer (eds.), Open Systems: Proceedings of the 18th International Conference on Computer-Aided Architectural Design Research in Asia (CAADRIA 2013), 705 714. 2013,

More information

Single Nucleotide Polymorphisms (SNPs)

Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Polymorphisms (SNPs) Sequence variations Single nucleotide polymorphisms Insertions/deletions Copy number variations (large: >1kb) Variable (short) number tandem repeats Single Nucleotide

More information

Genome-wide QTL and eqtl analyses using Mendel

Genome-wide QTL and eqtl analyses using Mendel The Author(s) BMC Proceedings 2016, 10(Suppl 7):10 DOI 10.1186/s12919-016-0037-6 BMC Proceedings PROCEEDINGS Genome-wide QTL and eqtl analyses using Mendel Hua Zhou 1*, Jin Zhou 3, Tao Hu 1,2, Eric M.

More information

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical. S SG ection ON tatistical enetics Metabolomics meets Genomics Hemant K. Tiwari, Ph.D. Professor and Head Section on Statistical Genetics Department of Biostatistics School of Public Health Metabolomics:

More information

Copyright Warning & Restrictions

Copyright Warning & Restrictions Copyright Warning & Restrictions The copyright law of the United States (Title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material. Under certain conditions

More information

The Future of IntegrOmics

The Future of IntegrOmics The Future of IntegrOmics Kristel Van Steen, PhD 2 kristel.vansteen@ulg.ac.be Systems and Modeling Unit, Montefiore Institute, University of Liège, Belgium Bioinformatics and Modeling, GIGA-R, University

More information

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding. 1 Introduction What is Genomic selection and how does it work? How can we best use DNA data in the selection of cattle? Mike Goddard 5/1/9 University of Melbourne and Victorian DPI of genomic selection

More information

Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features

Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features 1 Zahra Mahoor, 2 Mohammad Saraee, 3 Mohammad Davarpanah Jazi 1,2,3 Department of Electrical and Computer Engineering,

More information