VEGAS2: Gene-based test software using 1000 Genomes reference sets. User Manual

Size: px
Start display at page:

Download "VEGAS2: Gene-based test software using 1000 Genomes reference sets. User Manual"

Transcription

1 VEGAS2: Gene-based test software using 1000 Genomes reference sets. User Manual Version: 16:09:002 Date: 16 th September 2014 By Aniket Mishra, Stuart Macgregor Statistical Genetics Group QIMR Berghofer Medical Research Institute Brisbane, Australia

2 1. Introduction 2. Web-Interface 3. Offline Interface a. Installation b. Input files c. Options d. Example 4. References Content

3 Introduction VEGAS2 is an upgrade of gene-based association testing package VEGAS[1]. It performs gene based test using simulation approach. It consider all SNPs with maf>0.01 provided by 1000 genome project [2]. SNPs are assigned to genes of which co-ordinates are based on hg19 assembly (For full description of how genelist file been made refer the Methods section of VEGAS2 paper). The webbased version and offline version are provided. Both these versions are easy to use and only require two column input text file with 1 st column of SNP ids and other with p-values. One significant application of VEGAS2 is to perform gene-based test on X-chromosome GWAS summary data by accounting for linkage disequilibrium between SNPs. Allele frequency may vary in between male and female samples. In VEGAS2 we provide sex option, which allows user to specify either Males or Females from 1000G populations to consider while calculating ld matrix. Stratifying on sexes may lead to loss of power. The variation in allele frequency could be because of sampling error. By default, VEGAS2 assumes no allele frequency difference between males and females and perform analysis using all individuals irrespective of sex (Example A). To give flexibility to incorporate regulatory SNPs in gene testing five different gene-definitions are provided (see in options). The statistics of number of SNPs assigned through different gene definitions for 1000 Genomes phase 1 populations are provided in table 1. Gene Definition 0kbloc 10kbloc 20kbloc 50kbloc 0kbldbin Population 1000GEURO GASN GAFR GAMR *These are number of unique SNPs assigned to one or more hg19 genes. These SNPs are filtered with maf>0.01 and hwe p-value> Web-Based Interface Web-based version of VEGAS2 can be accessed through Offline interface VEGAS2 offline version is developed in perl scripting language for unix/linux based operating system. It can downloaded from Installation Download the Unzip and untar it using following command tar zxvf zvegas2offline.tgz cd VEGAS2offline

4 In this directory, you will find two executable files 1. VEGAS2.pl 2. VEGAS2.config And two directories a. VEGAS2database b. VEGAS2scripts You can move the two directories wherever you would like to and then provide path of these directories as arguments while running VEGAS2.config as follows $ sh vegas2.config /path/of/vegas2database /path/of/vegas2scripts For example $ sh vegas2.config /scratch/aniketm/vegas2database /home/aniketm/bin/vegas2scripts It will edit required paths in vegas2.pl present in working directory. Now you can move VEGAS2 executable file wherever you would like and delete VEGAS2 folder and VEGAS2.config file etc. Input files a. SNP id and p-values: It is as a two-column whitespace separated text file. The first column lists 1000G SNP ids and other column lists respective GWAS p-values. This file should be without header and should not have NAs. e.g. rs rs rs rs rs rs rs rs rs rs b. Genelist file: It is the one column text file with list of gene symbol. e.g. PIK3R1 SEMA3E PTCHD1-AS

5 Options -pop -subpop -genesize To specify which 1000G phase1 population to use out of 1000GEURO (european), 1000GASN (asian), 1000GAMR (american) and 1000GAFR (african). Options to use 1000 Genomes phase 3 populations data are 1000GEUROPhase3 (european), 1000GASNPhase3 (asian), 1000GAMRPhase3 (american) and 1000GAFRPhase3 (african) Provide the sub reference population out of EURO (default) or CEU, GBR, IBS, FIN, TSI for 1000GEURO reference population or ASN, CHB, CHS, JPT for 1000GASN reference population or AFR, ASW, LWK, YRI for 1000GAFR reference population or AMR, CLM, MXL, PUR To specify which gene definition to use. There are five options available viz. 0kbloc (default), 10kbloc, 20kbloc, 50kbloc and 0kbldbin -chr To run vegas2 on specific chromosome. It could be in between 1 to 23. -genelist -top -bestsnp -sex -max -adjust -out To run vegas2 on specific list of genes. It tell vegas2 to perform top percentage test where it consider specified percentage of top SNPs It tell vegas2 to perform best SNP test. This option is provided for X-chromosome analysis. It tells vegas2 to consider either male (Default) of female 1000G individuals to make ld matrix for simulations. It tells VEGAS2 the maximum number of simulation to perform. It must be above 1e6. To get genomic inflation corrected p-values. It will create one more file <outfile>.corrected It tells VEGAS2 the output file name. Note: Do not provide chr and genelist option together. Similarly top and bestsnp will not work together. Example VEGAS2offline directory contains two example (test) files as follows: 1. test_vegas2input.txt two-column vegas2 input file. 2. genelist.txt genelist input file for genelist parameter in vegas2. (A) Default usage: $/path/of/vegas2 <SNPPvaluefile> -pop 1000GEURO subpop EURO genesize 0kbloc top 100 sex BothMnF max out genebased.v2out

6 ./vegas2 test_vegas2input.txt -pop 1000GEURO -subpop EURO -genesize 0kbloc -top 100 sex BothMnF max out genebased.v2out Chr Gene nsnps nsims Start Stop Test Pvalue TopSNP TopSNP-pvalue 15 "FSIP1" "rs " "NF1" "rs " "RAB11FIP4" "rs " "NF2" "rs " "ZMAT5" "rs105311" "DMD" "rs " "PTCHD1-AS" "rs " "LOC " "rs " "LOC " "rs " (B) Use of bestsnp option on chromosome 22, on Finish (FIN) population with 10 kb upstream/downstream gene definition: $ perl /path/of/vegas2.pl <SNPPvaluefile> -pop 1000GEURO -subpop FIN -genesize 10kbloc -chr 22 -bestsnp./vegas2 test_vegas2input.txt -pop 1000GEURO -subpop FIN -genesize 10kbloc -chr 22 -bestsnp Chr Gene nsnps nsims Start Stop Test Pvalue TopSNP TopSNP-pvalue 22 "CABP7" "rs105311" "NF2" "rs " "ZMAT5" "rs105311" (C) Use of top test to consider top 5% of SNPs on genelist, on TSI European population with 0kbldbin gene definition: $ perl /path/of/vegas2.pl <SNPPvaluefile> genelist <Genelist file> -pop 1000GEURO subpop TSI genesize 0kbldbin top 5./vegas2 test_vegas2input.txt -genelist genelist.txt -pop 1000GEURO -subpop TSI -genesize 0kbldbin -top 5

7 Chr Gene nsnps nsims Start Stop Test Pvalue TopSNP TopSNP-pvalue 23 "PTCHD1-AS" "rs " "PIK3R1" "rs " "SEMA3E" "rs " (D) Use of top test to consider 20% of top SNPs on chromosome 23 (X-chromosome), All European population with 20kbloc gene definition on Males: $ perl /path/of/vegas2.pl <SNPPvaluefile> top 20 genesize 20kbloc chr 23 sex Males./vegas2 test_vegas2input.txt -top 20 -genesize 20kbloc -chr 23 -sex Males Chr Gene nsnps nsims Start Stop Test Pvalue TopSNP TopSNP-pvalue 23 "DMD" "rs " "PTCHD1-AS" "rs " (E) Use of best test on chromosome 23 (X-chromosome), European CEU population with 50kbloc gene definition on Females: $ perl /path/of/vegas2.pl <SNPPvaluefile> -bestsnp -pop 1000GEURO -subpop CEU -genesize 50kbloc -chr 23 -sex Females./vegas2 test_vegas2input.txt -bestsnp -pop 1000GEURO -subpop CEU -genesize 50kbloc -chr 23 -sex Females Chr Gene nsnps nsims Start Stop Test Pvalue TopSNP TopSNP-pvalue 23 "BGN" "rs " "DMD" "rs " "HAUS7" "rs " "PTCHD1-AS" "rs " "ZFP92" "rs "

8 References 1. Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, Investigators A, Hayward NK, Montgomery GW, Visscher PM, et al: A versatile gene-based test for genome-wide association studies. Am J Hum Genet 2010, 87: Genomes Project C, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA: An integrated map of genetic variation from 1,092 human genomes. Nature 2012, 491:56-65.

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Kevin Galinsky Harvard T. H. Chan School of Public Health American Society

More information

Derrek Paul Hibar

Derrek Paul Hibar Derrek Paul Hibar derrek.hibar@ini.usc.edu Obtain the ADNI Genetic Data Quality Control Procedures Missingness Testing for relatedness Minor allele frequency (MAF) Hardy-Weinberg Equilibrium (HWE) Testing

More information

Genome variation - part 1

Genome variation - part 1 Genome variation - part 1 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 21 th January 2016 Aims of the session Introduce major

More information

I/O Suite, VCF (1000 Genome) and HapMap

I/O Suite, VCF (1000 Genome) and HapMap I/O Suite, VCF (1000 Genome) and HapMap Hin-Tak Leung April 13, 2013 Contents 1 Introduction 1 1.1 Ethnic Composition of 1000G vs HapMap........................ 2 2 1000 Genome vs HapMap YRI (Africans)

More information

Browsing Genes and Genomes with Ensembl

Browsing Genes and Genomes with Ensembl Browsing Genes and Genomes with Ensembl Victoria Newman Ensembl Outreach Officer EMBL-EBI Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website.

More information

Genotype quality control with plinkqc Hannah Meyer

Genotype quality control with plinkqc Hannah Meyer Genotype quality control with plinkqc Hannah Meyer 219-3-1 Contents Introduction 1 Per-individual quality control....................................... 2 Per-marker quality control.........................................

More information

Population description. 103 CHB Han Chinese in Beijing, China East Asian EAS. 104 JPT Japanese in Tokyo, Japan East Asian EAS

Population description. 103 CHB Han Chinese in Beijing, China East Asian EAS. 104 JPT Japanese in Tokyo, Japan East Asian EAS 1 Supplementary Table 1 Description of the 1000 Genomes Project Phase 3 representing 2504 individuals from 26 different global populations that are assigned to five super-populations Number of individuals

More information

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015 Lecture 3: Introduction to the PLINK Software Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 PLINK Overview PLINK is a free, open-source whole genome association analysis

More information

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2017

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2017 Lecture 3: Introduction to the PLINK Software Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 20 PLINK Overview PLINK is a free, open-source whole genome

More information

ARTICLE Contrasting X-Linked and Autosomal Diversity across 14 Human Populations

ARTICLE Contrasting X-Linked and Autosomal Diversity across 14 Human Populations ARTICLE Contrasting X-Linked and Autosomal Diversity across 14 Human Populations Leonardo Arbiza, 1,2 Srikanth Gottipati, 1,2 Adam Siepel, 1 and Alon Keinan 1, * Contrasting the genetic diversity of the

More information

Human Populations: History and Structure

Human Populations: History and Structure Human Populations: History and Structure In the paper Novembre J, Johnson, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann A, Nelson MB, Stephens M, Bustamante CD. 2008. Genes mirror geography

More information

Sequence variation Introductory bioinformatics for human genomics workshop, UNSW

Sequence variation Introductory bioinformatics for human genomics workshop, UNSW Sequence variation Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 29 th January 2016 Aims of the session Introduce major human

More information

Human Genetics and Gene Mapping of Complex Traits

Human Genetics and Gene Mapping of Complex Traits Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2015 Human Genetics Series Thursday 4/02/15 Nancy L. Saccone, nlims@genetics.wustl.edu ancestral chromosome present day chromosomes:

More information

Human Genetics and Gene Mapping of Complex Traits

Human Genetics and Gene Mapping of Complex Traits Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2017 Human Genetics Series Tuesday 4/10/17 Nancy L. Saccone, nlims@genetics.wustl.edu ancestral chromosome present day chromosomes:

More information

UK Biobank Axiom Array

UK Biobank Axiom Array DATA SHEET Advancing human health studies with powerful genotyping technology Array highlights The Applied Biosystems UK Biobank Axiom Array is a powerful array for translational research. Designed using

More information

Population structure, heritability, and polygenic risk

Population structure, heritability, and polygenic risk Population structure, heritability, and polygenic risk Alicia Martin Daly Lab October 18, 2016 armartin@broadinstitute.org @genetisaur Project goals Call local ancestry in large case/control PTSD cohort

More information

Supplementary Figure 1 a

Supplementary Figure 1 a Supplementary Figure 1 a b GWAS second stage log 10 observed P 0 2 4 6 8 10 12 0 1 2 3 4 log 10 expected P rs3077 (P hetero =0.84) GWAS second stage (BBJ, Japan) First replication (BBJ, Japan) Second replication

More information

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public

More information

Nucleotide variation in the human genome

Nucleotide variation in the human genome Nucleotide variation in the human genome Elena Salmerón Quesada Genomics 13/12/2017 HUMAN GENETIC VARIATION 84.7 MILLION SINGLE NUCLEOTIDE POLYMORPHISMS (SNPs) 3.6 MILLION INDELS 60.000 STRUCTURAL VARIANTS

More information

Mining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo

Mining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo Mining GWAS Catalog & 1000 Genomes Dataset Segun Fatumo What is GWAS Catalog NHGRI GWA Catalog www.genome.gov/gwastudies Citation How to cite the NHGRI GWAS Catalog: Hindorff LA, MacArthur J (European

More information

Supplementary Figure 1. Study design of a multi-stage GWAS of gout.

Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 2. Plot of the first two principal components from the analysis of the genome-wide study (after QC) combined with

More information

Assignment 9: Genetic Variation

Assignment 9: Genetic Variation Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant

More information

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse SUPPLEMENTARY INFORMATION De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations Wong et al. The Supplementary Information contains 4 Supplementary Figures, 3

More information

Update on the Genomics Data in the Health and Re4rement Study. Sharon Kardia Jennifer A. Smith University of Michigan April 2013

Update on the Genomics Data in the Health and Re4rement Study. Sharon Kardia Jennifer A. Smith University of Michigan April 2013 Update on the Genomics Data in the Health and Re4rement Study Sharon Kardia Jennifer A. Smith University of Michigan April 2013 Genetic variation in SNPs (Single Nucleotide Polymorphisms) ATTGCAATCCGTGG...ATCGAGCCA.TACGATTGCACGCCG

More information

Resources at HapMap.Org

Resources at HapMap.Org Resources at HapMap.Org HapMap Phase II Dataset Release #21a, January 2007 (NCBI build 35) 3.8 M genotyped SNPs => 1 SNP/700 bp # polymorphic SNPs/kb in consensus dataset International HapMap Consortium

More information

PLINK gplink Haploview

PLINK gplink Haploview PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,

More information

Supplementary Figures

Supplementary Figures Supplementary Figures 1 Supplementary Figure 1. Analyses of present-day population differentiation. (A, B) Enrichment of strongly differentiated genic alleles for all present-day population comparisons

More information

Population stratification. Background & PLINK practical

Population stratification. Background & PLINK practical Population stratification Background & PLINK practical Variation between, within populations Any two humans differ ~0.1% of their genome (1 in ~1000bp) ~8% of this variation is accounted for by the major

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test Copyrighted 2018 Zongxiao He & Suzanne M. Leal Introduction Many population-based rare-variant association tests, which aggregate

More information

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE : GENETIC DATA UPDATE April 30, 2014 Biomarker Network Meeting PAA Jessica Faul, Ph.D., M.P.H. Health and Retirement Study Survey Research Center Institute for Social Research University of Michigan HRS

More information

Axiom mydesign Custom Array design guide for human genotyping applications

Axiom mydesign Custom Array design guide for human genotyping applications TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required

More information

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Mark J. Rieder Department of Genome Sciences mrieder@u.washington washington.edu Epidemiology Studies Cohort Outcome Model to fit/explain

More information

Analysing Alu inserts detected from high-throughput sequencing data

Analysing Alu inserts detected from high-throughput sequencing data Analysing Alu inserts detected from high-throughput sequencing data Harun Mustafa Mentor: Matei David Supervisor: Michael Brudno July 3, 2013 Before we begin... Even though I'll only present the minimal

More information

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm The Whole Genome TagSNP Selection and Transferability Among HapMap Populations Reedik Magi, Lauris Kaplinski, and Maido Remm Pacific Symposium on Biocomputing 11:535-543(2006) THE WHOLE GENOME TAGSNP SELECTION

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Eigenvector plots for the three GWAS including subpopulations from the NCI scan.

Nature Genetics: doi: /ng Supplementary Figure 1. Eigenvector plots for the three GWAS including subpopulations from the NCI scan. Supplementary Figure 1 Eigenvector plots for the three GWAS including subpopulations from the NCI scan. The NCI subpopulations are as follows: NITC, Nutrition Intervention Trial Cohort; SHNX, Shanxi Cancer

More information

Haplotypes, linkage disequilibrium, and the HapMap

Haplotypes, linkage disequilibrium, and the HapMap Haplotypes, linkage disequilibrium, and the HapMap Jeffrey Barrett Boulder, 2009 LD & HapMap Boulder, 2009 1 / 29 Outline 1 Haplotypes 2 Linkage disequilibrium 3 HapMap 4 Tag SNPs LD & HapMap Boulder,

More information

Browsing Genes and Genomes with Ensembl

Browsing Genes and Genomes with Ensembl Training materials - - - - Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their

More information

Understanding genetic association studies. Peter Kamerman

Understanding genetic association studies. Peter Kamerman Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Lee JH, Cheng R, Barral S, Reitz C, Medrano M, Lantigua R, Jiménez-Velazquez IZ, Rogaeva E, St. George-Hyslop P, Mayeux R. Identification of novel loci for Alzheimer disease

More information

Package RVFam. March 10, 2015

Package RVFam. March 10, 2015 Version 1.1 Date 2015-02-06 Package RVFam March 10, 2015 Title Rare Variants Association Analyses with Family Data Author Ming-Huei Chen and Qiong Yang Maintainer Ming-Huei

More information

Package geno2proteo. December 12, 2017

Package geno2proteo. December 12, 2017 Type Package Package geno2proteo December 12, 2017 Title Finding the DNA and Protein Sequences of Any Genomic or Proteomic Loci Version 0.0.1 Date 2017-12-12 Author Maintainer biocviews

More information

Goal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases

Goal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases GCTA Practical 2 Goal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases GCTA practical: Real genotypes, simulated phenotypes Genotype Data

More information

An Introduction to the package geno2proteo

An Introduction to the package geno2proteo An Introduction to the package geno2proteo Yaoyong Li January 24, 2018 Contents 1 Introduction 1 2 The data files needed by the package geno2proteo 2 3 The main functions of the package 3 1 Introduction

More information

Supplementary Figure 1. Principle component analysis based on the GWAS subjects and the HapMap Phase 2 populations. (A) Distributions of all subjects

Supplementary Figure 1. Principle component analysis based on the GWAS subjects and the HapMap Phase 2 populations. (A) Distributions of all subjects Supplementary Figure 1. Principle component analysis based on the GWAS subjects and the HapMap Phase 2 populations. (A) Distributions of all subjects in the GWAS stage and four HapMap populations; (B)

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL SUPPLEMENTAL MATERIAL Supplementary Table 1: RT-qPCR primer sequences. Sequences are shown from 5 to 3 direction; all primers are designed using mouse genome as reference. 36B4-F; TGAAGCAAAGGAAGAGTCGGAGGA

More information

Nature Genetics: doi: /ng.3254

Nature Genetics: doi: /ng.3254 Supplementary Figure 1 Comparing the inferred histories of the stairway plot and the PSMC method using simulated samples based on five models. (a) PSMC sim-1 model. (b) PSMC sim-2 model. (c) PSMC sim-3

More information

SNPassoc: an R package to perform whole genome association studies

SNPassoc: an R package to perform whole genome association studies SNPassoc: an R package to perform whole genome association studies Juan R González, Lluís Armengol, Xavier Solé, Elisabet Guinó, Josep M Mercader, Xavier Estivill, Víctor Moreno November 16, 2006 Contents

More information

Analyzing Affymetrix GeneChip SNP 6 Copy Number Data in Partek

Analyzing Affymetrix GeneChip SNP 6 Copy Number Data in Partek Analyzing Affymetrix GeneChip SNP 6 Copy Number Data in Partek This example data set consists of 20 selected HapMap samples, representing 10 females and 10 males, drawn from a mixed ethnic population of

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Haplotype testing and Minimum GWAS analysis steps Jason Mezey jgm45@cornell.edu April 17, 2017 (T) 8:40-9:55 Announcements Project

More information

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html

More information

Alkes Price Harvard School of Public Health January 24 & January 26, 2017

Alkes Price Harvard School of Public Health January 24 & January 26, 2017 EPI 511, Advanced Population and Medical Genetics Week 1: Intro + HapMap / 1000 Genomes Linkage Disequilibrium Alkes Price Harvard School of Public Health January 24 & January 26, 2017 EPI 511: Course

More information

Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations. Supplementary Materials

Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations. Supplementary Materials Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations Supplementary Materials Chen Wu 1, 22, Xiaoping Miao 2, 22, Liming Huang 1,

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

Supplemental materials. Table S1 Population names and abbreviations

Supplemental materials. Table S1 Population names and abbreviations 1 2 Supplemental materials Table S1 Population names and abbreviations 3 Population Code Super population Esan in Nigeria ESN AFR 99 Gambian in Western Division, Mandinka GWD AFR 113 Luhya in Webuye, Kenya

More information

Using the Association Workflow in Partek Genomics Suite

Using the Association Workflow in Partek Genomics Suite Using the Association Workflow in Partek Genomics Suite This user guide will illustrate the use of the Association workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within

More information

Supplementary Fig. 1. Location of top two candidemia associated SNPs in CD58 gene

Supplementary Fig. 1. Location of top two candidemia associated SNPs in CD58 gene Supplementary Figures Supplementary Fig. 1. Location of top two candidemia associated SNPs in CD58 gene locus. The region encompass CD58 and three long non-coding RNAs (RP4-655J12.4, RP5-1086K13.1 and

More information

Single Nucleotide Polymorphisms (SNPs)

Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Polymorphisms (SNPs) Sequence variations Single nucleotide polymorphisms Insertions/deletions Copy number variations (large: >1kb) Variable (short) number tandem repeats Single Nucleotide

More information

Genome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma. Supplementary Information

Genome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma. Supplementary Information Genome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma Vinod Kumar 1,2, Naoya Kato 3, Yuji Urabe 1, Atsushi Takahashi 2, Ryosuke Muroyama 3, Naoya Hosono

More information

PERSPECTIVES. A gene-centric approach to genome-wide association studies

PERSPECTIVES. A gene-centric approach to genome-wide association studies PERSPECTIVES O P I N I O N A gene-centric approach to genome-wide association studies Eric Jorgenson and John S. Witte Abstract Genic variants are more likely to alter gene function and affect disease

More information

Roadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications

Roadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Alex Helm Product Manager Genotyping Applications 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa,

More information

Summary. Introduction

Summary. Introduction doi: 10.1111/j.1469-1809.2006.00305.x Variation of Estimates of SNP and Haplotype Diversity and Linkage Disequilibrium in Samples from the Same Population Due to Experimental and Evolutionary Sample Size

More information

The HapMap Project and Haploview

The HapMap Project and Haploview The HapMap Project and Haploview David Evans Ben Neale University of Oxford Wellcome Trust Centre for Human Genetics Human Haplotype Map General Idea: Characterize the distribution of Linkage Disequilibrium

More information

Supplemental Data. Who's Who? Detecting and Resolving. Sample Anomalies in Human DNA. Sequencing Studies with Peddy

Supplemental Data. Who's Who? Detecting and Resolving. Sample Anomalies in Human DNA. Sequencing Studies with Peddy The American Journal of Human Genetics, Volume 100 Supplemental Data Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy Brent S. Pedersen and Aaron R. Quinlan

More information

Analysing Linkage Disequilibrium with Ensembl

Analysing Linkage Disequilibrium with Ensembl Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

Analysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004

Analysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004 Analysis of large deletions in human-chimp genomic alignments Erika Kvikstad BioInformatics I December 14, 2004 Outline Mutations, mutations, mutations Project overview Strategy: finding, classifying indels

More information

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY. The psoriasis associated deletion of late cornified envelope genes LCE3B and LCE3C has been maintained under balancing selection since Human Denisovan divergence Petar Pajic 1 *, Yen Lung Lin 1 *, Duo

More information

Genotype Prediction with SVMs

Genotype Prediction with SVMs Genotype Prediction with SVMs Nicholas Johnson December 12, 2008 1 Summary A tuned SVM appears competitive with the FastPhase HMM (Stephens and Scheet, 2006), which is the current state of the art in genotype

More information

IL1B-CGTC haplotype is associated with colorectal cancer in. admixed individuals with increased African ancestry

IL1B-CGTC haplotype is associated with colorectal cancer in. admixed individuals with increased African ancestry IL1B-CGTC haplotype is associated with colorectal cancer in admixed individuals with increased African ancestry María Carolina Sanabria-Salas 1, 2,*, Gustavo Hernández-Suárez 1, Adriana Umaña- Pérez 2,

More information

SNPTracker: A swift tool for comprehensive tracking and unifying dbsnp rsids and genomic coordinates of massive sequence variants

SNPTracker: A swift tool for comprehensive tracking and unifying dbsnp rsids and genomic coordinates of massive sequence variants G3: Genes Genomes Genetics Early Online, published on November 19, 2015 as doi:10.1534/g3.115.021832 SNPTracker: A swift tool for comprehensive tracking and unifying dbsnp rsids and genomic coordinates

More information

Summer Institute in Statistical Genetics Module 6: Computing for Statistical Genetics

Summer Institute in Statistical Genetics Module 6: Computing for Statistical Genetics Summer Institute in Statistical Genetics Module 6: Computing for Statistical Genetics Thomas Lumley Ken Rice 7. Bioconductor intro Auckland, December 2008 What is Bioconductor? What is Bioconductor? www.bioconductor.org

More information

Analysis of genome-wide genotype data

Analysis of genome-wide genotype data Analysis of genome-wide genotype data Acknowledgement: Several slides based on a lecture course given by Jonathan Marchini & Chris Spencer, Cape Town 2007 Introduction & definitions - Allele: A version

More information

More Introduction to Positive Selection

More Introduction to Positive Selection More Introduction to Positive Selection Ryan Hernandez Tim O Connor ryan.hernandez@ucsf.edu 1 Genome-wide scans The EHH approach does not lend itself to a genomewide scan. Voight, et al. (2006) create

More information

ИСПОЛЬЗОВАНИ Е ЧИПАТОРОВ. Клиническая лаборатория

ИСПОЛЬЗОВАНИ Е ЧИПАТОРОВ. Клиническая лаборатория ИСПОЛЬЗОВАНИ Е ЧИПАТОРОВ Клиническая лаборатория 1 2 DISTRIBUTED VS CONSOLIDATED One machine per lab Optimized usage time and PI control Sample prep and data analysis is done inside the lab All equipment

More information

A review of intelligence GWAS hits: their relationship to country IQ and the issue of spatial autocorrelation.

A review of intelligence GWAS hits: their relationship to country IQ and the issue of spatial autocorrelation. A review of intelligence GWAS hits: their relationship to country IQ and the issue of spatial autocorrelation. Davide Piffer email: pifferdavide@gmail.com Abstract A review of published intelligence GWA

More information

, Veronique Bataille 1, Alessia Visconti 1 5,6,7,24. , Gibran Hemani

, Veronique Bataille 1, Alessia Visconti 1 5,6,7,24. , Gibran Hemani SUPPLEMENTARY INFORMATION Letters https://doi.org/10.1038/s41588-018-0100-5 In the format provided by the authors and unedited. Genome-wide association meta-analysis of individuals of European ancestry

More information

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications Ranajit Chakraborty, Ph.D. Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications Overview Some brief remarks about SNPs Haploblock structure of SNPs in the human genome Criteria

More information

PEAS: Package for Elementary Analysis of SNP data

PEAS: Package for Elementary Analysis of SNP data PEAS: Package for Elementary Analysis of SNP data Designed by Shuhua Xu and Li Jin Code by Shuhua Xu with contributions from Sanchit Gupta January 8, 2010 Contact information Shuhua Xu Ph.D., CAS-MPG Partner

More information

Lecture 2: Height in Plants, Animals, and Humans. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

Lecture 2: Height in Plants, Animals, and Humans. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013 Lecture 2: Height in Plants, Animals, and Humans Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013 Is height a polygenic trait? http://en.wikipedia.org/wiki/gregor_mendel Case Study

More information

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen The tutorial is designed to take you through the steps necessary to access SNP data from the primary database resources:

More information

Comparison of the levels of diversity between coldspots (CS) and highly recombining regions (HRRs) for SNPs in the FCQ data set.

Comparison of the levels of diversity between coldspots (CS) and highly recombining regions (HRRs) for SNPs in the FCQ data set. Supplementary Figure 1 Comparison of the levels of diversity between coldspots (CS) and highly recombining regions (HRRs) for SNPs in the FCQ data set. Odds ratios (ORs) are computed to compare SNP density

More information

Package traser. R topics documented: April 23, Type Package

Package traser. R topics documented: April 23, Type Package Type Package Package traser April 23, 2016 Title GWAS trait-associated SNP enrichment analyses in genomic intervals Version 1.0.0 Depends R (>= 3.2.0),GenomicRanges,IRanges,BSgenome.Hsapiens.UCSC.hg19

More information

Release Notes. JMP Genomics. Version 3.1

Release Notes. JMP Genomics. Version 3.1 JMP Genomics Version 3.1 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

Supplementary Note: Detecting population structure in rare variant data

Supplementary Note: Detecting population structure in rare variant data Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to

More information

Evidence of selection on human stature inferred from spatial distribution of allele frequencies.

Evidence of selection on human stature inferred from spatial distribution of allele frequencies. Evidence of selection on human stature inferred from spatial distribution of allele frequencies. 1 Davide Piffer Abstract Spatial patterns of allele frequencies reveal a clear signal of natural (or sexual)

More information

Bionano Solve Theory of Operation: Variant Annotation Pipeline

Bionano Solve Theory of Operation: Variant Annotation Pipeline Bionano Solve Theory of Operation: Variant Annotation Pipeline Document Number: 30190 Document Revision: B For Research Use Only. Not for use in diagnostic procedures. Copyright 2018 Bionano Genomics,

More information

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get

More information

Package traser. July 19, 2018

Package traser. July 19, 2018 Type Package Package traser July 19, 2018 Title GWAS trait-associated SNP enrichment analyses in genomic intervals Version 1.10.0 Depends R (>= 3.2.0),GenomicRanges,IRanges,BSgenome.Hsapiens.UCSC.hg19

More information

Computational Workflows for Genome-Wide Association Study: I

Computational Workflows for Genome-Wide Association Study: I Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases

More information

Package MADGiC. July 24, 2014

Package MADGiC. July 24, 2014 Package MADGiC July 24, 2014 Type Package Title Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver. The model accounts for (1) frequency of mutation

More information

ARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms

ARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms ARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms Catarina D. Campbell, 1 Nick Sampas, 2 Anya Tsalenko, 2 Peter H. Sudmant, 1 Jeffrey M. Kidd, 1,3 Maika Malig, 1 Tiffany

More information

In silico variant analysis: Challenges and Pitfalls

In silico variant analysis: Challenges and Pitfalls In silico variant analysis: Challenges and Pitfalls Fiona Cunningham Variation annotation coordinator EMBL-EBI www.ensembl.org Sequencing -> Variants -> Interpretation Structural variants SNP? In-dels

More information

GENOME-WIDE data sets from worldwide panels of

GENOME-WIDE data sets from worldwide panels of Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.116681 Population Structure With Localized Haplotype Clusters Sharon R. Browning*,1 and Bruce S. Weir *Department of Statistics,

More information

FSuite: exploiting inbreeding in dense SNP chip and exome data

FSuite: exploiting inbreeding in dense SNP chip and exome data FSuite: exploiting inbreeding in dense SNP chip and exome data Steven Gazal, Mourad Sahbatou, Marie-Claude Babron, Emmanuelle Génin, Anne-Louise Leutenegger To cite this version: Steven Gazal, Mourad Sahbatou,

More information

Supplementary Table 1. Idd13 candidate interval supporting human LTC-ICs.

Supplementary Table 1. Idd13 candidate interval supporting human LTC-ICs. Supplementary Table 1. Idd13 candidate interval supporting human LTC-ICs. Chr Start position Genomic marker EnsEMBL gene ID Gene symbol Primer 1 (5 3 ) Primer 2 (5 3 ) 2 128675293 ENSMUSG00000027387 Zc3h8

More information

Data quality control in genetic case-control association studies

Data quality control in genetic case-control association studies Europe PMC Funders Group Author Manuscript Published in final edited form as: Nat Protoc. 2010 September ; 5(9): 1564 1573. doi:10.1038/nprot.2010.116. Data quality control in genetic case-control association

More information

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically

More information

GENOME WIDE ASSOCIATION STUDY OF INSECT BITE HYPERSENSITIVITY IN TWO POPULATION OF ICELANDIC HORSES

GENOME WIDE ASSOCIATION STUDY OF INSECT BITE HYPERSENSITIVITY IN TWO POPULATION OF ICELANDIC HORSES GENOME WIDE ASSOCIATION STUDY OF INSECT BITE HYPERSENSITIVITY IN TWO POPULATION OF ICELANDIC HORSES Merina Shrestha, Anouk Schurink, Susanne Eriksson, Lisa Andersson, Tomas Bergström, Bart Ducro, Gabriella

More information

The PLINK example GWAS analysed by PLINK and Sib-pair

The PLINK example GWAS analysed by PLINK and Sib-pair The PLINK example GWAS analysed by PLINK and Sib-pair David Duffy Genetic Epidemiology Laboratory Introduction Overview of development of Sib-pair PLINK v. Sib-pair Overview of Sib-pair An extensible platform

More information