Genomic variants in Next Generation Sequencing data and their importance for biomedicine

Size: px
Start display at page:

Download "Genomic variants in Next Generation Sequencing data and their importance for biomedicine"

Transcription

1 enomic variants in Next eneration Sequencing data and their importance for biomedicine Sophia Derdak NSchool 2016, ugust 20, 2016

2 he N Situated in the Parc ientífic de Barcelona Staff > 60, > 50% informatics Directed by Ivo ut gilent s ertified Service Provider in Spain arget Enrichment System (since 2014) Illumina ertified Service Provider (since 2013) ISO 9001 quality certification (since 2014) ISO accreditation (in process) Mission arry out projects in genome analysis that will lead to significant improvements in people's health and quality of life, in collaboration with the Spanish, European and International Research ommunity. Research interests ancer enomics Disease ene Identification and Infectious Diseases grogenomics and Model Organisms Partners in International and National Projects I, EUVDIS, BLUEPRIN, SYBRIS, RD-ONNE, ESI, irprom, EV, REDN,... itrus, Melon, Olive, Lynx, Primates, Mouse, Drosophila,...

3 What are genomic variants? enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

4 Variants in the diploid genome Identification of genetic differences in comparison to a reference Reference (haploid) he true diploid genome of the sample ref/ref homozygous reference ref/alt heterozygous alt/alt homozygous alternative ref/alt heterozygous >99% of the genomic positions are not variant positions ~ variant positions / base position human genome < variant positions / base position human exome enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

5 enome sequencing: the experimental workflow Biological material DN sample - pool of cells - 2 homologous chromosomes per cell PR DN fragments - overlapping dilution load on sequencer amplified DN fragments raw read data reference sequence variant calling enomic Variants in Biomedicine aligned read pairs NSchool 2016, ugust 20, 2016 alignments Sophia Derdak, N

6 Sequencing-by-synthesis From flowcell to computer: Base calling - he sequence of colors read for each cluster in each cycle is translated to nucleotide sequence enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

7 Mapping of reads to the reference sequence 100bp read 100bp read (adapted from wikipedia) enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

8 From alignments to variants: the bioinformatic workflow.bam alignment file bam processing remove duplicates inspect alignments realignment +.fasta reference genome file variant calling + study-specific information.pileup position file.vcf variant call file annotations filtering enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

9 Variant calling: Variants in the sequencing data Identification of genetic differences in comparison to a reference Reference (haploid) he true diploid genome of the sample ref/ref homozygous reference ligned sequencing data derived from the sample ref/alt heterozygous alt/alt homozygous alternative ref/alt heterozygous ref/ref homozygous reference 0% alternative allele enomic Variants in Biomedicine ref/alt heterozygous 50% alternative allele NSchool 2016, ugust 20, 2016 alt/alt homozygous alternative 100% alternative allele Sophia Derdak, N??? 20% alternative allele

10 Variant calling: Extract variants only ligned sequencing data derived from the sample List of variant positions ltmann et al. Hum enet 2012: beginners guide to SNP calling from high-throughput DN-sequencing data. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

11 Bioinformatic tools for single nucleotide variant calling - samtools + bcftools (Sanger Institute, UK, and Broad Institute, US) - enome nalysis ool Kit (K) (Broad Institute, US) - VarScan (Washington University) - Platypus (Welcome rust enter, UK) - freebayes (Boston ollege, US) Keep in mind that different software use different algorithms and thresholds and results may vary LO. Pabinger S et al. Briefings in Bioinformatics 2013: survey of tools for variant analysis of next-generation genome sequencing data. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

12 Benchmarks for variant calling - N x Whole enome FSQs from Illumina Platinum enomes analyzed with the pipeline: - Results compared independently for SNPs and INDELs agains NIS reference set: Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Zook et al. Nat Biotechnol Mar;32(3): Results (on reliably callable region = 70% of the genome): Feature Mapper Variant aller P FP FN Specificity Sensitivity SNVs EM3 K-H Deletions EM3 K-H Insertions EM3 K-H S. Laurie, S. Derdak, R. onda, S. Beltran enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

13 he coverage represents the number of times a base of the sample genome (or target region) is read during sequencing. higher coverage provides higher power for data analysis. genome Sequenced fragments (reads) How to get a higher coverage: - mainly by loading more sequencing units (indexes, lanes, entire flowcells) with the same library preparation ypical coverage numbers (in N projects): - whole genome: 30x - exome: x - custom gene panel capture: >1000x enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

14 Sequencing data analysis is all probabilities! I believe that we do not know anything for certain, but everything probably. hristiaan Huygens - base calling (base qualities in the fastq files) - contig order in the reference assembly - reference sequence (not yet...) - read alignment (mapping quality) - variant position (variant and genotype quality) - p-values - probability likelihoods - PHRED scores Plato, ~400 B enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

15 Single sample variant results raw vcf file ( all variants ) mostly experiment-independent technical and quality filtering (well-covered positions with confident alternative allele) filtered vcf file ( good quality variants ) HR chromosome HR POS REF L enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 POS position on the chromosome REF sequence in the reference genome L alternative sequence detected in the sample genotype in the (diploid) sample Sophia Derdak, N

16 Variant calling: multi-sample analyses ancer: Somatic variants enomic Variants in Biomedicine Inheritance and de-novo variants NSchool 2016, ugust 20, 2016 ffected vs. control group Sophia Derdak, N

17 ompare two samples of the same individual (e.g. tumor-normal) vcf file ( good quality variants - all genotypes ) Definition of somatic variant, consider sample purity information Select variants with genotype in the normal and in the tumor sample dditionally, select alternative allele frequency thresholds for normal and tumor sample using filtered vcf file ( somatic variants ) HR POS REF enomic Variants in Biomedicine L _normal _normal 37,1 53,0 19,1 11,0 20,1 28,0 12,0 10,0 15,0 19,1 22,1 12,0 11,0 11,0 11,0 12,0 10,0 13,0 13,0 39,0 _tumor NSchool 2016, ugust 20, 2016 _tumor 44,7 53,16 21,7 21,5 23,8 29,7 19,5 16,5 31,6 30,6 24,5 4,6 16,5 19,5 18,4 12,5 24,5 29,8 22,4 36,10 HR chromosome POS position on the chromosome REF sequence in the reference genome L alternative sequence detected in the sample genotype in the (diploid) sample, per sample allele count, number of (ref, alt) bases, per sample Sophia Derdak, N

18 ompare three samples of a pedigree vcf file ( good quality variants - all genotypes ) pply model of inheritance: e.g. autosomal recessive Select variants with genotype in the parents and in the daughter filtered vcf file ( recessively inherited variants ) HR POS REF enomic Variants in Biomedicine L _daughter _father _mother NSchool 2016, ugust 20, 2016 HR chromosome POS position on the chromosome REF sequence in the reference genome L alternative sequence detected in the sample genotype in the (diploid) sample, per sample Sophia Derdak, N

19 ompare three samples of a pedigree vcf file ( good quality variants - all genotypes ) pply model of inheritance: e.g. de-novo Select variants with genotype in the parents and in the daughter filtered vcf file ( de-novo variants ) HR POS REF L enomic Variants in Biomedicine _daughter _father _mother NSchool 2016, ugust 20, 2016 HR chromosome POS position on the chromosome REF sequence in the reference genome L alternative sequence detected in the sample genotype in the (diploid) sample, per sample Sophia Derdak, N

20 a real world success story of finding the causative variant enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

21 ausative variant for inherited retinal dystrophy? chr17 Discard variants because: - they have low technical quality - they are frequent polymorphisms in the general population - they do not have a protein-coding effect de astro-miró M et al. PLOS One 2014: ombined enetic and High-hroughput Strategies for Molecular Diagnosis of Inherited Retinal Dystrophies. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

22 ausative variant for inherited retinal dystrophy? chr17: , >, chr17 de astro-miró M et al. PLOS One 2014: ombined enetic and High-hroughput Strategies for Molecular Diagnosis of Inherited Retinal Dystrophies. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

23 ausative variant for inherited retinal dystrophy? chr17: , >, chr17 ENSEMBL Functional annotations: genes, transcripts, coding sequences annotations at gene level UY2D Retina US genome browser, eneards...: issue specificity of gene function de astro-miró M et al. PLOS One 2014: ombined enetic and High-hroughput Strategies for Molecular Diagnosis of Inherited Retinal Dystrophies. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

24 ausative variant for inherited retinal dystrophy? chr17: , >, chr17 ENSEMBL Functional annotations: genes, transcripts, coding sequences annotations at gene level UY2D Retina c.2747> p.i916 annotations at position level Variant not annotated US genome browser, eneards...: issue specificity of gene function Ex (> exomes) general population frequency damaging probably damaging base change amino acid change Deleteriousness predictions: SIF PolyPhen2 DD de astro-miró M et al. PLOS One 2014: ombined enetic and High-hroughput Strategies for Molecular Diagnosis of Inherited Retinal Dystrophies. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

25 ausative variant for inherited retinal dystrophy? Look up a gene in a enome Browser: enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

26 More genomics resources Variants inside candidate genes or genomic regions are interesting variants HMD:: Human ene Mutation Database (ardiff University and Biobase mbh) OMIM :: Online Mendelian Inheritance in Man (John Hopkins University) Orphanet :: he portal for rare diseases and orphan drugs (INSERM, France) linvar :: Information about relationships among variation and human health (NBI) Disease-specific databases and publications (e.g. OSMI database for cancer) enetic linkage studies Helpful, when studying a case with a previously described disease phenotype he OMIM database is available and may be queried at: he Orphanet database is available at: linvar is available at: he OSMI atalogue for somatic mutations in cancer is available at: enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

27 What else can genomic variants tell us? enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

28 Somatic variants in cancer may help to decide on one therapy or another Example from one of my collaborations (not clinical routine yet): Renal cell carcinoma + lung + bone metastasis Mutation in K-mOR-S1-S2 Mutation in BRF everolimus sorafenib Mutation in c-me or XL cabozantinib sunitinib Bellmunt J et al. lin enitourin ancer 2014: Sequential targeted therapy after pazopanib therapy in patients with metastatic renal cell cancer: efficacy and toxicity. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

29 omplex diseases more complex than coding effect and inheritance One of the methods to assess complex disease is WS enome Wide ssociation Studies. - Look for genetic polymorphisms (not necessarily coding!) that associate with the trait - in 1000's of samples: cases and controls, perform statistical tests - Results can be single position or hotspot region around a position: Manhattan plots e.g. utism spectrum disorders highly associated polymorphisms on chr5 zoom in the hotspot is in the intergenic region close-up of the hotspot Wang K et al. Nature 2009: ommon genetic variants on 5p14.1 associate with autism spectrum disorders. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

30 Pharmacogenomics Pharmacogenomics is an emerging field that combines genetics with pharmacokinetics and pharmacodynamics of drugs. - to understand genetic polymorphisms among patients - to study the effect of these polymorphisms on the activity of the enzyme metabolizing the drug - to develop more accurate drug dosing in order to avoid intoxication or insufficient drug action. enes with variants affecting drug action drug Warfarin (inhibitor of blood coagulation) VKOR1 and YP29 Irinotecan (cancer) U11 hiopurine drugs (autoimmune disorders) PM and IP Lee JW et al. lin enet 2014: he emerging era of pharmacogenomics: current successes, future potential,and challenges. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

31 Incidental findings - originally coined in the field of radiology clinically relevant incidental DN variation can be defined as a verified DN variation that has a proven medically relevant phenotype not directly related to the condition being studied for research. It is an unforeseen clinical finding relevant to the individual research participant involved (and possibly to the family of the participant). - to be discussed in the field of bioethics Should the participant (or the participant's physician) be informed about the incidental finding? Does it make a difference whether the incidentally discovered genetic variant points at a disease with a therapy available or not? Properly informed consent for the study participants must explain the possibility of finding an incidental DN variation (especially in whole genome sequencing). Krier JB and reen R. urr Protoc Hum enet. 2013: Management of Incidental Findings in linical enomic Sequencing. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N

32 Recreational genotyping? Health reports - ancestry-related genetic reports - uninterpreted raw genetic data - oddities: Does fresh cilantro taste like soap to you? Yes No Not sure Eriksson N et al. arxiv 2012: genetic variant near olfactory receptor genes influences cilantro preference. enomic Variants in Biomedicine NSchool 2016, ugust 20, 2016 Sophia Derdak, N