Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

Size: px
Start display at page:

Download "Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li"

Transcription

1 Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants Mulin Jun Li

2 Content Genetic variant, potential function impact and general annotation Regulatory variant function prediction and prioritization Splicing variant function prediction Variant affecting mirna targeting Nonsynonymous variant deleterious prediction Effect of synonymous variant Loss of function variant and structure variant annotation Current challenges in variant annotation and functional prediction

3 Genetic Variant An alteration in the DNA nucleotide sequence. The term variant can be used to describe an alteration that may be benign, pathogenic, or of unknown significance. The term variant is increasingly being used in place of the term mutation. Germline Variant: heritable Somatic Mutation SNV Deletions Insertions Copy number variation Inversion Translocations

4 Functional/Pathogenic Variant Geneticists, evolutionary biologists, and molecular biologists apply distinct approaches, evaluating different and complementary lines of evidence for functional DNA alterations Deciphering functional and diseasecausal variants/genes is pivotal to understand the mechanism of trait development and pathogenesis of disease.

5 From sequencing to variant calling

6 What next???? QC Filtering??? Disease Inheritance Annotation & Prioritization??????

7 General variant annotation and prioritization pipeline

8 Gene/Knowledge-based annotation ANNOVAR Alternative Tools: KGGSeq, SNPEff, VEP

9 Variant affecting different biological processes

10 Regulatory variant 98% of human DNA are not protein-coding, and 88% of genome-wide association study (GWAS) trait-associated variants are located on noncoding region of human genome Prediction and prioritization of functionally regulatory variants are a major challenge in current human genetics

11 Quantitative detection of regulatory variant

12 Existing regulatory variant prediction tools Many sophisticated computational methods have been developed to predict and prioritize functional, or pathogenic regulatory variants GWAS3D GWAVA FunSeq2 CADD deltasvm Eigen fitcons Using machine learning based on functional annotation data, current methods have achieve acceptable performance in separate validation. However, these scores performed poorly compared with in vivo saturation mutagenesis of regulatory region.

13 Consistency of existed methods Spearman's rank correlation among eight tools shows week correlation Compiled resources for eight regulatory variant prediction algorithm scores: dbncfp: ftp:// /prvcs/dbncfp Evaluation on our new collected 5,454 causal regulatory variants Curated dataset (HGMD, Clinvar, Oreganno) + GWAS Fine mapping dataset

14 Performance of composite model Ten-fold cross-validation Independent curated dataset Composite strategy outperformed all existing methods Now integrate into KGGSeq Alterative tools: Functional: Funseq2, GWAVA Pathogenic: LINSIGHT, Eigen

15 Context-dependent functional genomics

16 Regulatory context GWAS SNP Causal SNP Tissue/Cell type-specific genomic signal is important feature for prioritizing context-dependent variant effect Gene A Gene B H3K27AC DNase The most relevant cell A Relevant cell B Cell B Cell C The most relevant cell A Relevant cell B Cell B How to better predict regulatory variant using tissue/cell type specific signals? Cell C

17 Cell type-specific logit model Selecting the optimal predictors by AIC for each cell type Compare significant predictors in different cell type specific models, and pick up top overlapped predictors for general model

18 Performance of cell type-specific model Now integrate into KGGSeq Alterative tools: fitcons, deltasvm

19 Alternative Splicing Exons can be differentially included in the mature mrna products during splicing. This process, called alternative splicing (AS), is one of the predominant mechanisms for generating distinct mrna isoforms from a single gene.

20 Splicing Variant Approximately 99% of mammalian splice sites follow the GT-AG ~0.9% are GC-AG and ~0.09% are AT-AC Genetic variants that disrupt or create the highly conserved splice site dinucleotide motifs can alter splicing patterns and produce alternative mrna and protein isoforms Mutations that affect splice site dinucleotides represent a large class of human disease mutations

21 Splicing-altered variant prediction tool Compiled resources for splicing variant prediction ensemble scores: dbscsnv: enetics.com/dbscsnv1.1.zip Alterative tools: SPANR

22 Functional variant in post-transcriptional regulation-mirna

23 Variant affects mirna targeting Recommended Tools: PolymiRTS, mirnasnp

24 Function of nonsynonymous variant

25 Prediction of missense variant deleterious effect Sequence conservation

26 Large number of tools for missense variant deleterious prediction

27 Combined strategy KGGSeq MetaSVM Alterative tools: REVEL, M-CAP

28 Synonymous variant affecting translation Tools: RNAsnp

29 Loss of function mutation, structural variant annotation Totally disrupt the protein function Most have large effect, but it is hard to predict functions of structure variant Tools: SVScore

30 Challenges in variant annotation and functional prediction Efficient variant annotation in the whole genome scale WES/WGS is generating tens of millions variants for each individual query variants size: 200K 10M Some annotation database could contains 3 billions base-wise information, like CADD, GREP, etc. annotation database size: 10K 3B Accurate algorithm for predicting functional regulatory variant is still lacking Lacks the annotation for slient variants, like synonymous variant or variant in the noncoding RNA Annotating and prioritizing structure variant is a big problem in current state

31 Thanks you