Computational methods for the analysis of rare variants

Similar documents
POLYMORPHISM AND VARIANT ANALYSIS. Matt Hudson Crop Sciences NCSA HPCBio IGB University of Illinois

Intro to population genetics

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test

Evolution, maintenance and allelic architecture of complex traits

Applied Bioinformatics

Nature Genetics: doi: /ng.3254

SUPPLEMENTARY INFORMATION

Variant prioritization in NGS studies: Annotation and Filtering "

Different Influence on RNAs and Proteins by Variants with different MAFs

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY

Model based inference of mutation rates and selection strengths in humans and influenza. Daniel Wegmann University of Fribourg

Incorporating ENCODE information into association analysis of whole genome sequencing data

Finding Enrichments of Functional Annotations for Disease-Associated Single- Nucleotide Polymorphisms

Analysis of in silico tools for evaluating missense variants

Automating the ACMG Guidelines with VSClinical. Gabe Rudy VP of Product & Engineering

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

In silico variant analysis: Challenges and Pitfalls

Whole-genome association studies based on genotyping have

Prioritization: from vcf to finding the causative gene

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

What determines if a mutation is deleterious, neutral, or beneficial?

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Mapping by recurrence and modelling the mutation rate

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Comprehensive Approach to Analyzing Rare Genetic Variants

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

ARTICLE. Introduction

Browsing Genes and Genomes with Ensembl

Variant Simulation Tools

SNP calling and VCF format

MoGUL: Detecting Common Insertions and Deletions in a Population

Gathering of pathogenicity evidence for novel variants. By Lewis Pang

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2017

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah

Outline. Detecting Selective Sweeps. Are we still evolving? Finding sweeping alleles

Supplementary Note: Detecting population structure in rare variant data

Supplemental Methods

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Most Rare Missense Alleles Are Deleterious in Humans: Implications for Complex Disease and Association Studies

Novel Variant Discovery Tutorial

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Functional Genomics and Single-Cell Genomics. Christine Queitsch Department of Genome Sciences

Genomics: Human variation

Polygenic Influences on Boys & Girls Pubertal Timing & Tempo. Gregor Horvath, Valerie Knopik, Kristine Marceau Purdue University

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the

GenSap Meeting, June 13-14, Aarhus. Genomic Selection with QTL information Didier Boichard

Rare Variant Association Testing under Low-Coverage Sequencing

Gene Networks for Neuro-Psychiatric Disease

Performance of statistical methods on CHARGE targeted sequencing data

Distinguishing genetic correlation from causation among 52 diseases and complex traits

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm

Chapter 3: Evolutionary genetics of natural populations

Understanding the science and technology of whole genome sequencing

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

General Approach for Combining Diverse Rare Variant Association Tests Provides Improved Robustness Across a Wider Range of Genetic Architectures

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Identifying rare and common disease associated variants in genomic data using Parkinson s disease as a model

Supplementary Materials for

Genomics Resources in WHI. WHI ( ) Extension Study Steering Committee Meeting Seattle, WA May 05-06, 2011

Personal and population genomics of human regulatory variation

Chapter 14: Genes in Action

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia

Algorithms for Genetics: Introduction, and sources of variation

Phen-Gen: Combining Phenotype and Genotype to Analyze Rare Disorders

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

CITATION FILE CONTENT / FORMAT

Creation of a PAM matrix

14 March, 2016: Introduction to Genomics

MSc in Genetics. Population Genomics of model species. Antonio Barbadilla. Course

Power Calculations for Gene-based Rare Variants Association Methods using SEQPower

Deleterious mutations

SNP-VISTA: An Interactive SNP Visualization Tool

PredictSNP 1.0: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations. User guide

Jumping Into Your Gene Pool: Understanding Genetic Test Results

Assay Validation Services

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Identification of deleterious mutations within three human genomes

Whole genome sequencing in the UK Biobank

NGS in Pathology Webinar

Recent Advances Towards An Intraspecific Theory of Human Variation for Digital Models

From genome-wide association studies to disease relationships. Liqing Zhang Department of Computer Science Virginia Tech

Linkage Disequilibrium. Adele Crane & Angela Taravella

SAAPpred: A New Server For Predicting The Pathogenicity of Mutations

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Translational Medicine in the Era of Big Data: Hype or Real?

Axiom mydesign Custom Array design guide for human genotyping applications

Phen-Gen: Combining Phenotype and Genotype to Analyze Rare Disorders

Nature Genetics: doi: /ng Supplementary Figure 1

Knowing Your NGS Downstream: Functional Predictions. May 15, 2013 Bryce Christensen Statistical Geneticist / Director of Services

DEVELOPMENT OF OPTIMIZED FIBER FEEDSTOCKS THROUGH APPLIED GENOMICS MICHAEL DEYHOLOS UNIVERSITY OF BRITISH COLUMBIA (OKANAGAN CAMPUS)

Genetic Testing and Analysis. (858) MRN: Specimen: Saliva Received: 07/26/2016 GENETIC ANALYSIS REPORT

Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

LETTERS. Proportionally more deleterious genetic variation in European than in African populations

Prostate Cancer Genetics: Today and tomorrow

Natural Selection Advanced Topics in Computa8onal Genomics

Transcription:

Computational methods for the analysis of rare variants Shamil Sunyaev Harvard-M.I.T. Health Sciences & Technology Division

Combine all non-synonymous variants in a single test Theory: 1) Most new missense mutations are functional (mutagenesis, population genetics, comparative genomics) 2) Most new missense mutations are only weakly deleterious (population genetics) 3) Most functional missense mutations are likely to influence phenotype in the same direction (mutagenesis, medical genetics) Data: multiple candidate gene studies HDL-C, LDL-C, Triglycerides, BMI, Blood pressure, Colorectal adenomas Kryukov et al., PNAS 2009

Combining variants in a single test Disease Control

Combining variants in a single test Disease Control Functional variants Neutral variants

How do we know that the variant is functional? Population genetics Bioinformatics The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. Probability that the variant is functional

Most functional mutations are under selective pressure even if the trait is not

Allele frequency is informative about selective pressure What is the optimal way to incorporate allele frequency information into a burden test? Is there a natural threshold of allele frequency? Is there an optimal way to weight allelic variants with respect to their allele frequency?

Allele frequency has to be taken into account when combining variants Cases Controls SNP1 1 0 SNP4 1 0 SNP7 1 0 SNP8 0 1 SNP2 2 2 SNP6 4 2 SNP3 10 1 SNP5 195 210

Probability that a variant is functionally significant given its allele frequency However, this dependence is not robust with respect to s 0!

Goldilocks alleles Special case in terms of study design: alleles of large effect that are frequent enough to be followed up individually in a larger population sample. Such goldilocks alleles are observed in the simulations. There is no optimal and robust weighting scheme or optimal threshold!

Variable threshold (VT) approach Cases Controls SNP1 1 0 SNP4 1 0 SNP7 1 0 SNP8 0 1 SNP2 2 2 SNP6 4 2 SNP3 10 1 SNP5 195 210

Variable threshold (VT) approach Cases Controls SNP1 1 0 SNP4 1 0 SNP7 1 0 SNP8 0 1 SNP2 2 2 SNP6 4 2 SNP3 10 1 SNP5 195 210

Variable threshold (VT) approach Cases Controls SNP1 1 0 SNP4 1 0 SNP7 1 0 SNP8 0 1 SNP2 2 2 SNP6 4 2 SNP3 10 1 SNP5 195 210

Variable threshold (VT) approach Cases Controls SNP1 1 0 SNP4 1 0 SNP7 1 0 SNP8 0 1 SNP2 2 2 SNP6 4 2 SNP3 10 1 SNP5 195 210

Variable threshold (VT) approach Cases Controls SNP1 1 0 SNP4 1 0 SNP7 1 0 SNP8 0 1 SNP2 2 2 SNP6 4 2 SNP3 10 1 SNP5 195 210

Variable Threshold (VT) approach Z-score max max data permutations max Allele frequency z(t) is the z-score of a regression across samples of phenotypes vs. counts of alleles with frequency below threshold T. We maximize z(t) over T. Type I error is controlled by permutations. Price, Kryukov et al., AJHG 2010

Beyond allele frequency Allele frequency does not capture all information about selective pressure. It is possible to further stratify allelic variants of the same frequency. Alleles under selective pressure are, on average, younger than neutral alleles. This is true even if they are at the same frequency.

Allelic age is informative even conditionally on frequency

Intuition behind the effect Allelic age can be measured by density of younger mutations and by LD decay

Bioinformatics predictions The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Does the mutation fit the pattern of past evolution? human dog fish VVSTADLCAPSSTKLDER A FVSTSELCAGSTTRLEER FLSTSELCVPSTLKVNEK Statistical issues: -sequences are related by phylogeny -generally, we have too few sequences

Predictions based on protein structure Most of pathogenic mutations are important for stability. Heuristic structural parameters help with predictions (albeit less than comparative genomics)

PolyPhen-2 www.genetics.bwh.harvard.edu/pph2 Adzhubei, et al. Nature Methods 2010

Incorporation of PolyPhen-2 scores into VT-test Kumar S et al. Genome Research 2009 We incorporated weights approximating these distributions into the test for alleles with frequency below 1% m n z(t ) = ξ i T C ij (π j π _ ) i =1 j =1 m n i =1 j =1 ( ξ T i C ij ) 2 1 2 Price, Kryukov et al., AJHG 2010

Power Power of Various Approaches Using Simulated Phenotypes T1 T5 WE VT VTP α = 0.001 0.135 0.180 0.097 0.205 0.257 α = 0.05 0.547 0.502 0.545 0.598 0.686 Results for Three Empirical Data Sets T1 T5 WE VT VTP TG 0.013 0.00009 0.0024 0.00036 0.00006 T1D 0.001 0.0000006 0.0000009 0.0000012 0.0000001 BMI 0.041 0.064 0.014 0.013 0.0027

This is a general approach Prediction scores can be easily incorporated into other tests such as WSS, CMC, RVE, C-alpha etc. Other available prediction methods include SIFT, Pmut, SNAP, SNPs3D, GERP etc. The same approach cab take into account sequencing errors.

We are likely to be underpowered to detect the effect of individual genes on traits Combining signal from multiple genes can dramatically increase power Although we do not know the right pathways, we can attempt constructing them automatically

http://string.embl.de/ SNIPE method

http://string.embl.de/ SNIPE method

SNIPE method

Acknowledgments The lab: Gregory Kryukov, Alex Shpunt, Adam Kiezun, Ivan Adzhubei, Victor Spirin, Steffen Schmidt, David Nusinow, Daniel Jordan HSPH, BWH, MGH Alkes Price, Lee-Jen Wei, Paul de Bakker, Shaun Purcell

Literature and software Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled Association Tests for Rare Variants in Exon-Resequencing Studies. Am J Hum Genet. 2010 Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nature Methods 2010 Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR. Power of deep, allexon resequencing for discovery of human trait genes. PNAS 2009 Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 2007 genetics.bwh.harvard.edu/pph2 genetics.bwh.harvard.edu/rare_variants