BTRY 7210: Topics in Quantitative Genomics and Genetics

Similar documents
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Computational Workflows for Genome-Wide Association Study: I

Genome-Wide Association Studies (GWAS): Computational Them

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016


Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Exploring the Genetic Basis of Congenital Heart Defects

Answers to additional linkage problems.

Genome-wide association studies (GWAS) Part 1

Linking Genetic Variation to Important Phenotypes

Linkage Disequilibrium. Adele Crane & Angela Taravella

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

Beyond single genes or proteins

Mapping and Mapping Populations

PERSPECTIVES. A gene-centric approach to genome-wide association studies

After the association: Functional and Biological Validation of Variants

A/A;b/b x a/a;b/b. The doubly heterozygous F1 progeny generally show a single phenotype, determined by the dominant alleles of the two genes.

H3A - Genome-Wide Association testing SOP

Exam 1 Answers Biology 210 Sept. 20, 2006

An introduction to genetics and molecular biology

Statistical challenges to genome-wide association study

General aspects of genome-wide association studies

Human linkage analysis. fundamental concepts

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

ARTICLE Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS

Chapter 11: Genome-Wide Association Studies

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Genetic Association Studies

Axiom mydesign Custom Array design guide for human genotyping applications

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

MAPPING GENES TO TRAITS IN DOGS USING SNPs

OBJECTIVES-ACTIVITIES 2-4

Section 4 - Guidelines for DNA Technology. Version October, 2017

Biology 142 Advanced Topics in Genetics and Molecular Biology Course Syllabus Spring 2006

Haplotype phasing in large cohorts: Modeling, search, or both?

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

DO NOT OPEN UNTIL TOLD TO START

Author's response to reviews

Conifer Translational Genomics Network Coordinated Agricultural Project

Title: Powerful SNP Set Analysis for Case-Control Genome Wide Association Studies. Running Title: Powerful SNP Set Analysis. Hill, NC. MD.

DNA Structure & the Genome. Bio160 General Biology

Lees J.A., Vehkala M. et al., 2016 In Review

A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL. Abstracts

AP BIOLOGY Population Genetics and Evolution Lab

PLINK gplink Haploview

Bayesian Networks as framework for data integration

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Analysis of Microarray Data

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

SNPassoc: an R package to perform whole genome association studies

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

Haplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014

GENETIC ALGORITHMS. Narra Priyanka. K.Naga Sowjanya. Vasavi College of Engineering. Ibrahimbahg,Hyderabad.

Supplementary Note: Detecting population structure in rare variant data

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

Genetic load. For the organism as a whole (its genome, and the species), what is the fitness cost of deleterious mutations?

ARTICLE Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach

Concepts and relevance of genome-wide association studies

Molecular markers in plant breeding

Age-Adjusted Death Rates for Coronary Heart Disease, U.S.,

Strategy for applying genome-wide selection in dairy cattle

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions

SNP calling and VCF format

Reading Between the Genes: Computational Models to Discover Function from Noncoding DNA

Population and Community Dynamics. The Hardy-Weinberg Principle

Conifer Translational Genomics Network Coordinated Agricultural Project

Question. In the last 100 years. What is Feed Efficiency? Genetics of Feed Efficiency and Applications for the Dairy Industry

Genomic resources and gene/qtl discovery in cereals

Logistics. Final exam date. Project Presentation. Plan for this week. Evolutionary Algorithms. Crossover and Mutation

LAB ACTIVITY ONE POPULATION GENETICS AND EVOLUTION 2017

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Figure S4 A-H : Initiation site properties and evolutionary changes

Quantitative Genetics

Genome-wide association studies. Gene regulation and the genomics of complex traits. Relating Variation to Phenotype. Relating Variation to Phenotype

Stefano Monti. Workshop Format

Concepts: What are RFLPs and how do they act like genetic marker loci?

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

Cowpea Breeding. Ainong Shi. University of Arkansas

University of Groningen. The value of haplotypes Vries, Anne René de

Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS

Improved Power by Use of a Weighted Score Test for Linkage Disequilibrium Mapping

Whole genome sequencing in drug discovery research: a one fits all solution?

ACCEPTED. Victoria J. Wright Corresponding author.

EVOLUTION/HERDEDITY UNIT Unit 1 Part 8A Chapter 23 Activity Lab #11 A POPULATION GENETICS AND EVOLUTION

Derrek Paul Hibar

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

POPULATION GENETICS AND EVOLUTION

Genetics of extreme body size evolution in mice from Gough Island

A Statistical Framework for Joint eqtl Analysis in Multiple Tissues

An introductory overview of the current state of statistical genetics

Runs of Homozygosity Analysis Tutorial

Transcription:

BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu January 29, 2015

Why you re here (=eqtl): Spring 2015 Course Announcement BTRY 7210 Topics in Quantitative Genomics and Genetics Professor: Jason Mezey Biological Statistics and Computational Biology Time: Thurs. 12:20-1:10 PM Room: 224 Weill Hall COURSE DESCRIPTION: We will consider the problem of identifying and leveraging expression Quantitative Trait Loci (eqtl) when analyzing genome-wide data. The class will include a TBD ratio of lectures by the instructor : reading / discussion of papers. Students taking the class for a grade will be required to produce a single, minicritique report of current papers touching on topics covered in the class. General topics areas that will be considered will include: probability and statistics necessary for understanding eqtl analysis, the basics of eqtl analysis, quality control and model checking in eqtl analysis, advanced eqtl analysis techniques including hidden factor analysis, extending analyses to xqtl, biological value and interpretation of eqtl, combining eqtl and other bioinformatics data for biological discovery, structure and application of probabilistic graphical models that make use of eqtl for network analysis and discovery. GRADING: S/U or Audit. CREDITS: 1 SUGGESTED PREREQUISITES: Quantitative Genomics and Genetics (BTRY 6830 / 4830) and/or background in statistics and/or background in eqtl, GWAS or related genetic mapping analyses

Today Logistics reminders Introduction to eqtl part 2

Logistics Format: a combination of lectures and possibly discussion of papers (that I will select) focused on specific subjects Updated info. on the class website (bottom of classes page): http://mezeylab.cb.bscb.cornell.edu/ Make sure you are on the listserv (!!!) (email me to join / remove): mezey-groupm-l@cornell.edu If you can register for the class, please do (=Audit) - you may also register S/U (either leading a discussion or a minor ~10-20 hour final report will be required)

The eqtl concept II expression Quantitative Trait Locus (eqtl) - a polymorphic locus where an experimental exchange of one allele for another produces a change in expression on average under specified conditions: A 1 A 2 Y The allelic states defined by the original mutation event define the causal polymorphism of the eqtl Intuitive example: if rs27290 was a causal allele, changing A -> G would change the measured expression of ERAP2 eqtl ERAP2 expression 3.5 4.0 4.5 5.0 5.5 6.0 A/A A/G G/G rs27290 genotype

Detecting eqtl from the analysis of genome-wide data I Since eqtl reflect a case where different allelic combinations (genotypes) lead to different levels of gene expression, we could in theory discover an eqtl by testing for an association between measured genotypes and gene expression levels Most eqtl are discovered using this type of approach A typical (human) eqtl experiment includes m (= ~10-30K) expression variables and N (= ~0.1-10mil) genotypes measured in n individuals sampled from a population A typical (most!) analysis of such data proceeds by performing independent statistical tests of (a subset of) genotype-expression pairs, where tests that are significant after a multiple test correct (e.g. Bonferroni), are assumed to indicate an eqtl

Detecting eqtl from the analysis of genome-wide data II eqtl We (almost) always detect eqtl by testing (non-causal) markers in LD with the causal polymorphism rs27290 ERAP2 expression 3.5 4.0 4.5 5.0 5.5 6.0 eqtl A/A A/G G/G rs27290 genotype Copyright: Journal of Diabetes and its Complications; Science Direct; Vendramini et al ERAP2 expression 3.5 4.0 4.5 5.0 5.5 6.0 T/T A/A T/G A/G G/G T45G rs27290 genotype

Genome-wide scan for eqtl: typical outcome eqtl (p < 10 30 ) ERAP2 expression 3.5 4.0 4.5 5.0 5.5 6.0 A/A A/G G/G rs27290 genotype no eqtl (n.s.) ERAP2 expression 3.5 4.0 4.5 5.0 5.5 6.0 T/T T/C C/C rs1908530 genotype

Statistical foundation I (see BTRY 6830 lectures on class site) We need to begin by defining our sample space for an eqtl A experiment: For each individual in our sample { space, } we are interested in pairs of sample outcomes {(a single pair at a time!): Where (Ω g is the set { of possible genotype outcomes for an individual at a locus and Ω P is the set of values of the expression variable for an individual Note that for a diploid, with { two alleles } (typical for humans!): Ω g = {A 1 A 1,A 1 A 2,A 2 A 2 } Ω = {possible individuals} Ω = {Ω g Ω P } F F

Statistical foundation II Next, we need to define { the probability model: Pr(F Ω )=Pr(F g,p ) We will define two (types) or random R variables (* = state does F not matter): F Y :(, Ω P ) R Y = measurable expression value X :(Ω g, ) R X(A 1 A 1 )= 1,X(A 1 A 2 )=0,X(A 2 A 2 ) = 1 Note that the probability model induces a (joint) probability distribution on the these random variables: Pr(Y,X)

Statistical foundation III To assess whether the marker genotype indicates an eqtl, we need to assess the following hypothesis: H 0 : Cov(Y,X) = 0 H A : Cov(Y,X) = 0 To do this, we will collect a sample of size n of expression and genotype pairs (y, x) and define a statistic T(y, x), for which we know the distribution under the null hypothesis, such that we can calculate a p-value: pval = Pr(T t H 0 : true) p-value - the probability of obtaining a value of a statistic, or more extreme, conditional on H0 being true To analyze the data from a genome-wide eqtl experiment, we calculate a p- values for each of (a subset of) the total set of expression-genotype pairs and for cases where we reject the null (at an appropriate multiple test corrected type I error), we assume that this indicates an eqtl Note that we usually consider a run of contiguous genotypes for which we reject the null for the same expression variable to indicate the position of a single causal eqtl polymorphism

Genome-wide scan for eqtl: typical outcome eqtl (p < 10 30 ) ERAP2 expression 3.5 4.0 4.5 5.0 5.5 6.0 A/A A/G G/G rs27290 genotype no eqtl (n.s.) ERAP2 expression 3.5 4.0 4.5 5.0 5.5 6.0 T/T T/C C/C rs1908530 genotype

Typical outcome: zooming in and cis- v trans- This is a cis- eqtl because the significant genotypes are in the same location as the expressed gene (otherwise, it would be a trans- eqtl) Most eqtl are cis-, which makes biological sense

What we will discuss 1: genomewide identification of eqtl one gene, one SNP one gene, multiple SNPs all genes, all SNPs. one gene, all SNPs.

What we will discuss II: reducing eqtl false positives Population structure and and hidden factors can cause false positive associations = correlations - that don t represent true true genetic genetic effects. effects These effects are visible on the p-value heatmap: population structure hidden factor Usually we can remove these artifacts by including appropriate covariates in our analysis We can sometimes remove these artifacts by including appropriate covariates in our analysis in a mixed model (we will also consider binary phenotype!) or by using a hidden factor analysis

What we will discuss III: can we identify causal genotypes (alleles)?

What we discuss IV: leveraging eqtl for annotation? eqtl SNPs tend to be in disequilibrium (LD) blocks enriched for ENCODE Transcription Factor motifs, suggesting a functional mechanism

What we will discuss V: leveraging eqtl for GWAS eqtl co-localize with disease loci identified in GWAS, indicating a common genetic basis and a method for identifying candidate causal polymorphisms for disease risk

What we will discuss VI: leveraging eqtl for network inference eqtl are used within Probabilistic Graphical Modeling (PGM) frameworks to discover new network / pathway / regulatory relationships

That s it for today Reminder: please email me to join the listserv (!!)