Package FSTpackage. June 27, 2017

Similar documents
Package ASGSCA. February 24, 2018

Lees J.A., Vehkala M. et al., 2016 In Review

Runs of Homozygosity Analysis Tutorial

Population Genetics. Lab Exercise 14. Introduction. Contents. Objectives

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

AP Biology Laboratory 8 Population Genetics Virtual Student Guide

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

PLINK gplink Haploview

Derrek Paul Hibar

BTRY 7210: Topics in Quantitative Genomics and Genetics

General aspects of genome-wide association studies

The Making of the Fittest: Natural Selection in Humans

University of Groningen. The value of haplotypes Vries, Anne René de

H3A - Genome-Wide Association testing SOP

Genetic Equilibrium: Human Diversity Student Version

Package stepwise. R topics documented: February 20, 2015

HARDY-WEINBERG EQUILIBRIUM

Axiom mydesign Custom Array design guide for human genotyping applications

Genetic load. For the organism as a whole (its genome, and the species), what is the fitness cost of deleterious mutations?

MMAP Genomic Matrix Calculations

POPULATION GENETICS AND EVOLUTION

Package ABC.RAP. October 20, 2016

Bean Bunny Evolution Modeling Gene Frequency Change (Evolution) in a Population by Natural Selection

Package lddesign. February 15, 2013

Package GeneExpressionSignature

A GENOTYPE CALLING ALGORITHM FOR AFFYMETRIX SNP ARRAYS

Supplementary Note: Detecting population structure in rare variant data

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Supplementary Figures

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

EVOLUTION/HERDEDITY UNIT Unit 1 Part 8A Chapter 23 Activity Lab #11 A POPULATION GENETICS AND EVOLUTION

Exploring the Genetic Basis of Congenital Heart Defects

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Cytochrome P450 Genotype Panel

Hardy-Weinberg Principle

Computational Workflows for Genome-Wide Association Study: I

Population and Community Dynamics. The Hardy-Weinberg Principle


Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino

AP BIOLOGY Population Genetics and Evolution Lab

A Statistical Framework for Joint eqtl Analysis in Multiple Tissues

Gene expression connectivity mapping and its application to Cat-App

Linking Genetic Variation to Important Phenotypes

Tutorial #3: Brand Pricing Experiment

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

Gap hunting to characterize clustered probe signals in Illumina methylation array data

Package goseq. R topics documented: December 23, 2017

MENDELIAN GENETICS This presentation contains copyrighted material under the educational fair use exemption to the U.S. copyright law.

Terminology: chromosome; gene; allele; proteins; enzymes

Genome-wide association studies (GWAS) Part 1

Exploring genomic databases: Practical session "

Using the Trio Workflow in Partek Genomics Suite v6.6

OBJECTIVES-ACTIVITIES 2-4

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

Statistical Tests for Admixture Mapping with Case-Control and Cases-Only Data

Package powerpkg. February 20, 2015

OPTIMIZATION OF BREEDING SCHEMES USING GENOMIC PREDICTIONS AND SIMULATIONS

Biotools: an R function to predict spatial gene diversity via an individual-based approach

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Association Rule Discovery Has the Ability to Model Complex Genetic Effects

SVMerge Output File Format Specification Sheet

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

SNPassoc: an R package to perform whole genome association studies

Oncomine cfdna Assays Part III: Variant Analysis

Observing Patterns In Inherited Traits

The Making of the Fittest: Natural Selection and Adaptation Allele and phenotype frequencies in rock pocket mouse populations

Human linkage analysis. fundamental concepts

LAB 19 Population Genetics and Evolution II

Week 7 - Natural Selection and Genetic Variation for Allozymes

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

11.1. A population shares a common gene pool. The Evolution of Populations CHAPTER 11. Fill in the concept map below.

LAB ACTIVITY ONE POPULATION GENETICS AND EVOLUTION 2017

Biol Lecture Notes

POPULATION GENETICS. Evolution Lectures 4

Mapping and Mapping Populations

3 Ways to Improve Your Targeted Marketing with Analytics

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

Package BAGS. September 21, 2014

Rapid Cycle PCR, Real Time Analysis, and Hi-Res Melting

Molecular Markers CRITFC Genetics Workshop December 9, 2014

Genome-Wide Association Studies (GWAS): Computational Them

A Bayesian Graphical Model for Genome-wide Association Studies (GWAS)

Gene Set Enrichment Analysis! Robert Gentleman!

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

Using mutants to clone genes

Package subgroup. February 20, 2015

Bio 6 Natural Selection Lab

Package ChannelAttribution

SUPPLEMENTARY INFORMATION

Genetics module. DNA Structure, Replication. The Genetic Code; Transcription and Translation. Principles of Heredity; Gene Mapping

POPULATION GENETICS: The study of the rules governing the maintenance and transmission of genetic variation in natural populations.

Aristos Aristodimou, Athos Antoniades, Constantinos Pattichis University of Cyprus David Tian, Ann Gledson, John Keane University of Manchester

The Making of the Fittest: Natural Selection and Adaptation

Methods Available for the Analysis of Data from Dominant Molecular Markers

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures

Transcription:

Type Package Package FSTpackage June 27, 2017 Title Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotation Scores Version 0.1 Date 2016-12-14 Author Zihuai He Maintainer Zihuai He <zihuai@umich.edu> Functions for sequencing studies allowing for multiple functional annotation scores. Score type tests and an efficient perturbation method are used for individual gene/large gene-set/genome wide analysis. Only summary statistics are needed. License GPL-3 Depends CompQuadForm, SKAT, Matrix, MASS, mvtnorm R topics documented: FST.example........................................ 1 FST.GeneSet.test...................................... 2 FST.prelim......................................... 3 FST.SummaryStat.test................................... 4 FST.test........................................... 5 Index 7 FST.example Data example for FSTest (tests for genetic association allowing for multiple functional annotation scores) The dataset contains outcome variable Y, covariate X, genotype data G, functional scores Z and gene-set ID for each variable GeneSetID. 1

2 FST.GeneSet.test FST.GeneSet.test Test the association between an quantitative/dichotomous outcome variable and a large gene-set by a score type test allowing for multiple functional annotation scores. Once the preliminary work is done using "FST.prelim()", this function tests a specifc gene. FST.GeneSet.test(result.prelim,G,Z,GeneSetID,Gsub.id=NULL,weights=NULL, B=5000,impute.method='fixed') result.prelim G Z GeneSetID Gsub.id weights The output of function "FST.prelim()" Genetic variants in the target gene, an n*p matrix where n is the subject ID and p is the total number of genetic variables. Note that the number of rows in G should be same as the number of subjects. ***The column name should be the variable name, in order to be matched with the GeneSetID. Functional annotation scores, an p*q matrix where p is the total number of genetic variables and q is the number of functional annotation scores. Note that the first column in Z should be all 1 if the users want the original weights of SKAT/burden test to be included. A p*2 matrix indicating the genes in which the variables are located, where the first column is the genes name and the second column is the variables name. The subject id corresponding to the genotype matrix, an n dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X. A numeric vector of weights for genetic variants (The length should be same as the number of genetic variants in the set). These weights are usually based on minor allele frequencies. The default is NULL, where the beta(1,25) weights are applied. B Number of Bootstrap replicates. The default is 5000. impute.method Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. n.marker p.value number of heterozygous SNPs in the SNP set. P-value of the set based generalized score type test.

FST.prelim 3 ## FST.prelim does the preliminary data management. # Input: Y, X (covariates) ## FST.test tests a region. # Input: G (genetic variants), Z (functional annotation scores) and result of FST.prelim # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by d matrix # G: genotype matrix, n by p matrix where n is the total number of subjects # Z: functional annotation matrix, p by q matrix Y<-FST.example$Y;X<-FST.example$X;G<-FST.example$G;Z<-FST.example$Z;GeneSetID<-FST.example$GeneSetID # Preliminary data management result.prelim<-fst.prelim(y,x=x,out_type='d') # test with 5000 bootstrap replicates result<-fst.geneset.test(result.prelim,g,z,genesetid,b=5000) FST.prelim The preliminary data management for FST (functional score tests) Before testing a specific gene using a score type test, this function does the preliminary data management, such as fitting the model under the null hypothesis. FST.prelim(Y, X=NULL, id=null, out_type="c") Y X id out_type The outcome variable, an n*1 matrix where n is the total number of observations An n*d covariates matrix where d is the total number of covariates. The subject id. This is used to match the genotype matrix. The default is NULL, where the a matched phenotype and genotype matrix is assumed. Type of outcome variable. Can be either "C" for continuous or "D" for dichotomous. The default is "C". It returns a list used for function FST.test().

4 FST.SummaryStat.test # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by d matrix # G: genotype matrix, n by p matrix where n is the total number of subjects # Z: functional annotation matrix, p by q matrix Y<-FST.example$Y;X<-FST.example$X;G<-FST.example$G # Preliminary data management result.prelim<-fst.prelim(y,x=x) FST.SummaryStat.test Using summary statistics to test the association between an quantitative/dichotomous outcome variable and a gene by a score type test allowing for multiple functional annotation scores. This function tests a specific gene using summary statistics (score vector and its covariance matrix) FST.SummaryStat.test(score,Sigma,Z,weights,B=5000) score Sigma Z weights The score vector of length p, where p is the total number of genetic variables. The p*p covariance matrix of the score vector Functional annotation scores, an p*q matrix where p is the total number of genetic variables and q is the number of functional annotation scores. Note that the first column in Z should be all 1 if the users want the original weights of SKAT/burden test to be included. A numeric vector of weights for genetic variants (The length should be same as the number of genetic variants in the set.). These weights are usually based on minor allele frequencies. B Number of Bootstrap replicates. The default is 5000. p.value P-value of the set based generalized score type test.

FST.test 5 ## FST.SummaryStat.test tests a region. # Input: score (a score vector), Sigma (the covariance matrix of the score vector) score<-fst.example$score;sigma<-fst.example$sigma;z<-fst.example$z;weights<-fst.example$weights # test with 5000 bootstrap replicates result<-fst.summarystat.test(score,sigma,z,weights,b=5000) FST.test Test the association between an quantitative/dichotomous outcome variable and a gene by a score type test allowing for multiple functional annotation scores. Once the preliminary work is done using "FST.prelim()", this function tests a specifc gene. FST.test(result.prelim,G,Z,Gsub.id=NULL,weights=NULL,B=5000,impute.method='fixed') result.prelim G Z Gsub.id weights The output of function "FST.prelim()" Genetic variants in the target gene, an n*p matrix where n is the subject ID and p is the total number of genetic variables. Note that the number of rows in G should be same as the number of subjects. Functional annotation scores, an p*q matrix where p is the total number of genetic variables and q is the number of functional annotation scores. Note that the first column in Z should be all 1 if the users want the original weights of SKAT/burden test to be included. The subject id corresponding to the genotype matrix, an n dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X. A numeric vector of weights for genetic variants (The length should be same as the number of genetic variants in the set.). These weights are usually based on minor allele frequencies. The default is NULL, where the beta(1,25) weights are applied. B Number of Bootstrap replicates. The default is 5000. impute.method Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability.

6 FST.test n.marker p.value number of heterozygous SNPs in the SNP set. P-value of the set based generalized score type test. ## FST.prelim does the preliminary data management. # Input: Y, X (covariates) ## FST.test tests a region. # Input: G (genetic variants), Z (functional annotation scores) and result of FST.prelim # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by d matrix # G: genotype matrix, n by p matrix where n is the total number of subjects # Z: functional annotation matrix, p by q matrix Y<-FST.example$Y;X<-FST.example$X;G<-FST.example$G;Z<-FST.example$Z # Preliminary data management result.prelim<-fst.prelim(y,x=x,out_type='d') # test with 5000 bootstrap replicates result<-fst.test(result.prelim,g,z,b=5000)

Index Topic datasets FST.example, 1 Topic preliminary work FST.prelim, 3 Topic test FST.GeneSet.test, 2 FST.SummaryStat.test, 4 FST.test, 5 FST.example, 1 FST.GeneSet.test, 2 FST.prelim, 3 FST.SummaryStat.test, 4 FST.test, 5 7