Calling DNA Variants Steve Laurie Centro Nacional de Analisis Genomico (CNAG-CRG), Barcelona

Size: px
Start display at page:

Download "Calling DNA Variants Steve Laurie Centro Nacional de Analisis Genomico (CNAG-CRG), Barcelona"

Transcription

1 Calling DNA Variants Steve Laurie Centro Nacional de Analisis Genomico (CNAG-CRG), Barcelona Variant Effect Predictor Training Course Heraklion, 31st October 2016

2 Calling DNA Variants - Overview What is a variant, and how many do we have? What do we do with our lab coats on? What do we do sitting at a computer? Some real world issues How well do we do in general?

3 What is a variant? 3 A variant is any position/region in our sample which differs from the haploid reference genome to which we are comparing it Single Nucleotide Variants (SNVs) e.g. A G note diploid individual may be AA, AG, or GG Short insertions and deletions (InDels) e.g. TA TATA (insertion of TA ) e.g. CT C (loss of the T at the third position) Copy Number Variants (CNVs) tandem duplication of longer regions (~1-100kb) that are typically polymorphic within the population e.g. AMY1 Structural Variants (SVs) larger still, and often more complex

4 What is a variant? 4 A variant is any position/region in our sample which differs from the haploid reference genome to which we are comparing it Single Nucleotide Variants (SNVs) e.g. A G note diploid individual may be AA, AG, or GG Short insertions and deletions (InDels) e.g. TA TATA (insertion of TA ) e.g. CT C (loss of the T at the third position) Copy Number Variants (CNVs) tandem duplication of longer regions (~1-100kb) that are typically polymorphic within the population e.g. AMY1 Structural Variants (SVs) larger still, and often more complex

5 How many variants do we have? 5 A variant is any position/region in our sample which differs from the haploid reference genome to which we are comparing it Single Nucleotide Variants (SNVs) ~ 3,500,000 4,000,000 ( ~ 30, ,000 exomic ) Short insertions and deletions (InDels) ~ 300, ,000 Copy Number Variants (CNVs) ~ 5-10% of genome Structural Variants larger still, and often more complex ~ 13% of genome

6 NGS Workfl ow what do we do with our lab coats on? 6 Kassahn, K. (2013)

7 NGS Workfl ow what do we do with our lab coats on? 7 Kassahn, K. (2013)

8 Targeted NGS Fracti onati on and Capture fractionation Figure from Mardis, E.R. (2012)

9 Targeted NGS Fracti onati on and Capture hybridisation 1 - fractionation Figure from Mardis, E.R. (2012)

10 Targeted NGS Fracti onati on and Capture hybridisation 1 - fractionation 3 - enrichment Figure from Mardis, E.R. (2012)

11 Targeted NGS Fracti onati on and Capture hybridisation 1 - fractionation 3 - enrichment 4 - amplification Figure from Mardis, E.R. (2012)

12 Paired-end Reads 12

13 Paired-end Reads 13 Typically bp

14 Paired-end Reads 14 Typically bp nt nt ~50-100bp

15 Read-mapping 15 ~50-400nt linker ~50-400nt linker

16 Read-mapping 16 ~50-400nt linker

17 Read-mapping 17 ~50-400nt linker

18 Mapped WES reads viewed in IGV 18 Coverage Reads Exons

19 Variant Calling ideal 19

20 Variant Calling real world 20

21 Variant Calling Tools 21 SAMtools

22 Variant Calling Tools 22 SAMtools GATK

23 Variant Calling Tools 23 SAMtools GATK freebayes

24 Variant Calling Tools 24 SAMtools GATK freebayes Platypus

25 Variant Calling Tools 25 Variant calling tools will start by calling every potential variant they observe This will include true variants, and false-positives due to: Library preparation artefacts PCR artefacts Sequencing errors Mapping issues Algorithm issues Subsequently apply a number of mechanisms to attempt to help identify the true positives from the false-positives, and provide metrics Currently, you will always have some false positives, and some false negatives

26 Variant Calling Data 26

27 Annotation of variant call at sample level These fields are shown in the FORMAT field of the VCF, and there are values for every sample in a multi-sample VCF G A 999 PASS DP=180;VDB= e-01;RPB= e01;AF1=0.5;AC1=4; DP4=43,49,46,37;MQ=35;FQ=999;PV4=0.29,1,1,1 GT:PL:DP:SP:GQ 0/0:0,135,255:45:0:99 1/1:255,102,0:34:0:99 0/1:255,0,255:46:6:99 Tag Field Definition GQ Genotype Quality Phred-scaled confidence that the real genotype is that reported versus the next most likely genotype DP Depth PL Probability Likelihood The likelihood of the possible genotypes (order 0/0, 0/1, 1/1, 0/2, 1/2, 2/2 ), normalised such that the value for the reported GT is set to 0 SP Strand Bias p-value Phred-scaled strand bias p-value for sample Number of reliable base calls at this position

28 Annotation of variant call at pedigree level This information is taken from FILTER/INFO field of the VCF and indicate positions that are failing across samples Tag Field Definition sb0.05 / sb0.001 Strand Bias Indicates that there was a signficant bias towards variant calls only being observed on one strand across all samples at <0.05 or <0.001 respecively tdb0.05 Tail Distance Bias Indicates that there was a significant positional bias within reads for variant calls across all samples at this position mrd10 / mrd15 Minimum Read Depth Indicates that at least one of the samples had coverage <10 or <15 at this position msb30 Maximum Strand Bias Indicates that at least one of the samples had a strand bias (SP) >30 map Mappability Variant observed in a region to which we know we have problems to align short reads SALX=Y Samples At Least (covered) SAL10=3 would mean that at least 3 of the samples in the VCF have a read depth of 10 at this position

29 Indel identi fi cati on Raw BWA mapped reads 4 Following local realignment DePristo, M. et al. (2011)

30 Indel identi fi cati on Raw BWA mapped reads Following local realignment DePristo, M. et al. (2011)

31 Indel identi fi cati on where exactly? 31

32 Prioritize variants: advanced technical filtering Tail distance bias/read position bias No reads spanning this region ReadPosRankSum = ReadPosRankSum = ReadPosRankSum = samtools: VDB field (proportion), PV4 field (p-value) GATK: ReadPosRankSum field (Z-score)

33 Prioritize variants: advanced technical filtering Strand bias Reads Reads samtools: PV4 field (p-value) GATK: FS field (Phred-scaled p-value)

34 Raw Data Alignment Standardise Representation Illumina Platinum 50x WGS NA12878 NimbleGen MedExome 90x WES NA12878 BWA-MEM v0.7.8 GEM v3.1 Sort & Mark Duplicates (Picard) + Indel Realigment (GATK v3.3) Call Variants FreeBayes v0.9 Minimal Filtering QUAL >30 HaplotypeCaller v3.3 + GenotypeGVCF QUAL >30 SAMtools v1.2 1) fast (-Bug) 2) slow (-ug) QUAL >30 16 Final Call Set VCFs ( 8 Genome & 8 Exome)

35 NIST provide calls for callable regions of NA12878 genome i.e. excluding simplerepeats, CNVs and known segemental duplications, in this sample 2,191MB, 2,915,728 unique variant positions

36 Comparison with NIST set Intersect our VCFs to same regions and compared positions in our VCFs with the NIST VCF for concordance at level of Chr-Pos-Ref-Alt-GT Ignored positions that were multi-allelic for the alternative allele ~0.15% in NIST

37 Comparison with NIST - SNVs Dataset Total Calls TP FP FN Specificity Sensitivity F1 score Whole Genome SNVs NIST v2.18 Gold Standard BWA + FreeBayes BWA + HaplotypeCaller BWA + SAMtools fast BWA + SAMtools normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMtools fast GEM3 + SAMtools normal TP Call identical in NIST and CNAG FP Call in CNAG not found in NIST FN Call in NIST not found in CNAG

38 Comparison with NIST - Deletions Dataset Total Calls TP FP FN Specificity Sensitivity F1 score Whole Genome Deletions NIST v2.18 Gold Standard BWA + FreeBayes BWA + HaplotypeCaller BWA + SAMtools fast BWA + SAMtools normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMtools fast GEM3 + SAMtools normal TP Call identical in NIST and CNAG FP Call in CNAG not found in NIST FN Call in NIST not found in CNAG

39 Comparison with NIST - Insertions Dataset Total Calls TP FP FN Specificity Sensitivity F1 score Whole Genome Insertions NIST v2.18 Gold Standard BWA + FreeBayes BWA + HaplotypeCaller BWA + SAMtools fast BWA + SAMtools normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMtools fast GEM3 + SAMtools normal TP Call identical in NIST and CNAG FP Call in CNAG not found in NIST FN Call in NIST not found in CNAG

40 Consensus between callers NIST callable region (2,195Mb)

41 Consensus between callers non-callable but mappable (594Mb)

42 WES V WGS

43 Summary Findings GEM3.1 is fast and resultant variant calling results are similar to those for BWA-MEM All variant callers tested are fairly similar in SNV accuracy There is much more variety in indel calls There is not a lot of diffence in accuracy between the two SamTools modes FreeBayes is very fast, but perhaps not as accurate as Haplotype Caller for indels

44

45

46 RD-Connect Genomics Platform D. Piscia, S. Laurie, S. Beltran

47 RD-Connect Genomics Platform Demos tomorrow at 14h and 16h D. Piscia, S. Laurie, S. Beltran

48 ISO 9001:2008

49 Acknowledgements rd-connect.eu platform.rd-connect.eu WP1: Coordination Hanns Lochmüller (Newcastle and TREAT-NMD) WP2: Patient registries Domenica Taruscio (ISS and EPIRARE) WP3: Biobanks Lucia Monaco (Fondaz. Telethon & EuroBioBank) WP4: Bioinformatics Christophe Béroud (INSERM Marseille) WP5: Unified platform Ivo Gut (CNAG Barcelona) WP6 Ethical/legal/social Mats Hansson (Uppsala) WP7: Impact/Innovation Kate Bushby (Newcastle and EUCERD/ EJARD) CNAG I. Gut S. Beltran D. Piscia S. Laurie J. Protasio A. Papakons. I. Martinez R. Tonda J.R. Trotta LUMC P.B. t Hoen M. Roos R. Raliyaperumal M. Thompson CNIO A. Valencia V. de la Torre J.M. Fernández A. Cañada U. Aveiro J.L. Oliveira P. Lopes P. Sernaleda Murdoch U. M. Bellgard MME H. Rehm AMU C. Béroud D. Salgado J.P. Desvignes Interactive BioSoftware A. Blavier S. Lair U. of Patras G. Patrinos Genesis S. Zuchner M.Gonzalez R. Acosta EGA D. Spalding J. Almeida-King A. Navarro J. Rambla Newcastle U. H. Lochmüller R. Thompson J. Dawson A. Topf I. Zaharieva U. Of Toronto M. Brudno M. Girdea S. Dumitriu O. Buske

RD-Connect: data sharing and analysis for RD research within the integrated platform and through GA4GH Beacon and MatchMaker Exchange.

RD-Connect: data sharing and analysis for RD research within the integrated platform and through GA4GH Beacon and MatchMaker Exchange. RD-Connect: data sharing and analysis for RD research within the integrated platform and through GA4GH Beacon and MatchMaker Exchange. Sergi Beltran Bioinformatics Analysis Group Leader Centro Nacional

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

Variant calling in NGS experiments

Variant calling in NGS experiments Variant calling in NGS experiments Jorge Jiménez jjimeneza@cipf.es BIER CIBERER Genomics Department Centro de Investigacion Principe Felipe (CIPF) (Valencia, Spain) 1 Index 1. NGS workflow 2. Variant calling

More information

Aims of the International Workshop

Aims of the International Workshop Aims of the International Workshop Domenica Taruscio domenica.taruscio@iss.it National Centre Rare Diseases National Institute for Health Rome - Italy 2 nd International Workshop Rare Disease and Orphan

More information

Next Generation Sequencing: Data analysis for genetic profiling

Next Generation Sequencing: Data analysis for genetic profiling Next Generation Sequencing: Data analysis for genetic profiling Raed Samara, Ph.D. Global Product Manager Raed.Samara@QIAGEN.com Welcome to the NGS webinar series - 2015 NGS Technology Webinar 1 NGS: Introduction

More information

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012 + Variant Detection in Next Generation Sequencing Data John Osborne Sept 14, 2012 + Overview My Bias Talk slanted towards analyzing whole genomes using Illumina paired end reads with open source tools

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines

More information

Germline variant calling and joint genotyping

Germline variant calling and joint genotyping talks Germline variant calling and joint genotyping Applying the joint discovery workflow with HaplotypeCaller + GenotypeGVCFs You are here in the GATK Best PracDces workflow for germline variant discovery

More information

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang Supplementary Materials for: Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John

More information

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,

More information

Assignment 9: Genetic Variation

Assignment 9: Genetic Variation Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant

More information

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG

More information

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples Andreas Scherer, Ph.D. President and CEO Dr. Donald Freed, Bioinformatics Scientist, Sentieon

More information

Read Mapping and Variant Calling. Johannes Starlinger

Read Mapping and Variant Calling. Johannes Starlinger Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.

More information

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data Bruno Zeitouni Bionformatics department of the Institut Curie Inserm U900 Mines ParisTech Ion Torrent User Meeting 2012, October

More information

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition RESEARCH ARTICLE Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition Moses M. Muraya 1,2, Thomas Schmutzer 1 *, Chris Ulpinnis

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS

More information

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

Oral Cleft Targeted Sequencing Project

Oral Cleft Targeted Sequencing Project Oral Cleft Targeted Sequencing Project Oral Cleft Group January, 2013 Contents I Quality Control 3 1 Summary of Multi-Family vcf File, Jan. 11, 2013 3 2 Analysis Group Quality Control (Proposed Protocol)

More information

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data. RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data. Adam Gudyś and Tomasz Stokowy October 30, 2017 Introduction The search for causative genetic variants in rare diseases

More information

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing

More information

The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data

The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data Donald Freed 1*, Rafael Aldana 1, Jessica A. Weber 2, Jeremy S. Edwards 3,4,5 1 Sentieon Inc,

More information

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti! Variant Analysis CB2-201 Computational Biology and Bioinformatics February 27, 2015 Emidio Capriotti http://biofold.org/emidio Division of Informatics Department of Pathology Variant Call Format The final

More information

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality

More information

L3: Short Read Alignment to a Reference Genome

L3: Short Read Alignment to a Reference Genome L3: Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Autumn School in Bioinformatics Cambridge, September 2017 Where to get help! http://seqanswers.com http://www.biostars.org http://www.bioconductor.org/help/mailing-list

More information

HLA and Next Generation Sequencing it s all about the Data

HLA and Next Generation Sequencing it s all about the Data HLA and Next Generation Sequencing it s all about the Data John Ord, NHSBT Colindale and University of Cambridge BSHI Annual Conference Manchester September 2014 Introduction In 2003 the first full public

More information

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

Axiom mydesign Custom Array design guide for human genotyping applications

Axiom mydesign Custom Array design guide for human genotyping applications TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required

More information

Whole genome sequencing in drug discovery research: a one fits all solution?

Whole genome sequencing in drug discovery research: a one fits all solution? Whole genome sequencing in drug discovery research: a one fits all solution? Marc Sultan, September 24th, 2015 Biomarker Development, Translational Medicine, Novartis On behalf of the BMD WGS pilot team:

More information

Fundamentals of Next-Generation Sequencing: Technologies and Applications

Fundamentals of Next-Generation Sequencing: Technologies and Applications Fundamentals of Next-Generation Sequencing: Technologies and Applications Society for Hematopathology European Association for Haematopathology 2017 Workshop Eric Duncavage, MD Washington University in

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Why can GBS be complicated? Tools for filtering, error correction and imputation. Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower

More information

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%

More information

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically

More information

Data processing and analysis of genetic variation using next-generation DNA sequencing!

Data processing and analysis of genetic variation using next-generation DNA sequencing! Data processing and analysis of genetic variation using next-generation DNA sequencing! Mark DePristo, Ph.D.! Genome Sequencing and Analysis Group! Medical and Population Genetics Program! Broad Institute

More information

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RADSeq Data Analysis Through STACKS on Galaxy Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RAD sequencing: next-generation tools for an old problem INTRODUCTION source: Karim Gharbi

More information

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality

More information

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie Sander van Boheemen Medical Microbiology Next-generation sequencing Next-generation sequencing (NGS), also known as

More information

Identifying copy number alterations and genotype with Control-FREEC

Identifying copy number alterations and genotype with Control-FREEC Identifying copy number alterations and genotype with Control-FREEC Valentina Boeva contact: freec@curie.fr Most approaches for predicting copy number alterations (CNAs) require you to have whole exomesequencing

More information

ACCEL-NGS 2S DNA LIBRARY KITS

ACCEL-NGS 2S DNA LIBRARY KITS ACCEL-NGS 2S DNA LIBRARY KITS Accel-NGS 2S DNA Library Kits produce high quality libraries with an all-inclusive, easy-to-use format. The kits contain all reagents necessary to build high complexity libraries

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Create a Planned Run. Using the Ion AmpliSeq Pharmacogenomics Research Panel Plugin USER BULLETIN. Publication Number MAN Revision A.

Create a Planned Run. Using the Ion AmpliSeq Pharmacogenomics Research Panel Plugin USER BULLETIN. Publication Number MAN Revision A. USER BULLETIN Create a Planned Run Using the Ion AmpliSeq Pharmacogenomics Research Panel Plugin Publication Number MAN0013730 Revision A.0 For Research Use Only. Not for use in diagnostic procedures.

More information

Identifying recessive gene candidates with GEMINI

Identifying recessive gene candidates with GEMINI Identifying recessive gene candidates with GEMINI Aaron Quinlan University of Utah! quinlanlab.org Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted

More information

International networks and DMD registries. Hanns Lochmüller, Newcastle University

International networks and DMD registries. Hanns Lochmüller, Newcastle University International networks and DMD registries Hanns Lochmüller, Newcastle University Why have a network? Rare diseases - no one country is enough To tackle issues that can be settled more effectively collaboratively

More information

Sanger vs Next-Gen Sequencing

Sanger vs Next-Gen Sequencing Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics

More information

Targeted Sequencing in the NBS Laboratory

Targeted Sequencing in the NBS Laboratory Targeted Sequencing in the NBS Laboratory Christopher Greene, PhD Newborn Screening and Molecular Biology Branch Division of Laboratory Sciences Gene Sequencing in Public Health Newborn Screening February

More information

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle Figure S1 a Unrearranged locus Rearranged locus Concordant read pairs Region1 Concordant read pairs Cluster of discordant read pairs, bundle Region2 Concordant read pairs b Physical coverage 5 4 3 2 1

More information

OHSU Digital Commons. Oregon Health & Science University. Benjamin Cordier. Scholar Archive

OHSU Digital Commons. Oregon Health & Science University. Benjamin Cordier. Scholar Archive Oregon Health & Science University OHSU Digital Commons Scholar Archive 5-19-2017 Evaluation Of Background Prediction For Variant Detection In A Clinical Context: Towards Improved Ngs Monitoring Of Minimal

More information

Sequence variation in the short tandem repeat system SE33 discovered by next generation sequencing

Sequence variation in the short tandem repeat system SE33 discovered by next generation sequencing Sequence variation in the short tandem repeat system SE33 discovered by next generation sequencing Eszter Rockenbauer, MSc, PhD and Line Møller, MSc Forensic Geneticist Section of Forensic Genetics Department

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Richard Corbett Canada s Michael Smith Genome Sciences Centre Vancouver, British Columbia June 28, 2017 Our mandate is to advance knowledge about cancer and other diseases

More information

A Pipeline for Markers Selection Using Restriction Site Associated DNA Sequencing (RADSeq)

A Pipeline for Markers Selection Using Restriction Site Associated DNA Sequencing (RADSeq) European Journal of Biophysics 2018; 6(1): 7-16 http://www.sciencepublishinggroup.com/j/ejb doi: 10.11648/j.ejb.20180601.12 ISSN: 2329-1745 (Print); ISSN: 2329-1737 (Online) A Pipeline for Markers Selection

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids. Supplementary Figure 1 Number and length distributions of the inferred fosmids. Fosmid were inferred by mapping each pool s sequence reads to hg19. We retained only those reads that mapped to within a

More information

Design and Validation of a 2 nd Tier Next Generation Sequencing (NGS) Panel for Newborn Screening for Severe Combined Immunodeficiency Disease (SCID)

Design and Validation of a 2 nd Tier Next Generation Sequencing (NGS) Panel for Newborn Screening for Severe Combined Immunodeficiency Disease (SCID) Design and Validation of a 2 nd Tier Next Generation Sequencing (NGS) Panel for Newborn Screening for Severe Combined Immunodeficiency Disease (SCID) September 13, 2017 Colleen Stevens, Ph.D. Research

More information

Ion S5 and Ion S5 XL Systems

Ion S5 and Ion S5 XL Systems Ion S5 and Ion S5 XL Systems Targeted sequencing has never been simpler Introducing the Ion S5 and Ion S5 XL systems Now, adopting next-generation sequencing in your lab is simpler than ever. The Ion S5

More information

CNV and variant detection for human genome resequencing data - for biomedical researchers (II)

CNV and variant detection for human genome resequencing data - for biomedical researchers (II) CNV and variant detection for human genome resequencing data - for biomedical researchers (II) Chuan-Kun Liu 劉傳崑 Senior Maneger National Center for Genome Medican bioit@ncgm.sinica.edu.tw Abstract Common

More information

Release Notes for Genomes Processed Using Complete Genomics Software

Release Notes for Genomes Processed Using Complete Genomics Software Release Notes for Genomes Processed Using Complete Genomics Software Version 1.11.0 Related Documents... 1 Changes to Version 1.11.0... 2 Changes to Version 1.10.0... 6 Changes to Version 1.9.0... 10 Changes

More information

Next Generation Sequencing. Target Enrichment

Next Generation Sequencing. Target Enrichment Next Generation Sequencing Target Enrichment Next Generation Sequencing Your Partner in Every Step from Sample to Data NGS: Revolutionizing Genetic Analysis with Single-Molecule Resolution Next generation

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

Cancer Genetics Solutions

Cancer Genetics Solutions Cancer Genetics Solutions Cancer Genetics Solutions Pushing the Boundaries in Cancer Genetics Cancer is a formidable foe that presents significant challenges. The complexity of this disease can be daunting

More information

Next-Generation Sequencing Services à la carte

Next-Generation Sequencing Services à la carte Next-Generation Sequencing Services à la carte www.seqme.eu ngs@seqme.eu SEQme 2017 All rights reserved The trademarks and names of other companies and products mentioned in this brochure are the property

More information

Jenny Gu, PhD Strategic Business Development Manager, PacBio

Jenny Gu, PhD Strategic Business Development Manager, PacBio IDT and PacBio joint presentation Characterizing Alzheimer s Disease candidate genes and transcripts with targeted, long-read, single-molecule sequencing Jenny Gu, PhD Strategic Business Development Manager,

More information

MoGUL: Detecting Common Insertions and Deletions in a Population

MoGUL: Detecting Common Insertions and Deletions in a Population MoGUL: Detecting Common Insertions and Deletions in a Population Seunghak Lee 1,2, Eric Xing 2, and Michael Brudno 1,3, 1 Department of Computer Science, University of Toronto, Canada 2 School of Computer

More information

Unravelling the genetic basis of Mayer-Rokitansky- Küster-Hauser syndrome through whole exome sequencing

Unravelling the genetic basis of Mayer-Rokitansky- Küster-Hauser syndrome through whole exome sequencing RESEARCH PROJECTS 2014 Unravelling the genetic basis of Mayer-Rokitansky- Küster-Hauser syndrome through whole exome sequencing Dr Antigone Dimas, Postdoctoral Research Fellow, BSRC Al. Fleming Dr Klelia

More information

Statistical method for Next Generation Sequencing pipeline comparison

Statistical method for Next Generation Sequencing pipeline comparison Statistical method for Next Generation Sequencing pipeline comparison Pascal Roy, MD PhD EPICLIN 2016 Strasbourg 25-27 mai 2016 MH Elsensohn 1-4*, N Leblay 1-4, S Dimassi 5,6, A Campan-Fournier 5,6, A

More information

An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values

An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values 2016 Data Compression Conference An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values Claudio Alberti *, Noah Daniels +, Mikel Hernaez, Jan Voges^, Rachel L. Goldfeder, Ana

More information

MHC Region. MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells

MHC Region. MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells DNA based HLA typing methods By: Yadollah Shakiba, MD, PhD MHC Region MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells Nomenclature of HLA Alleles Assigned

More information

therascreen BRCA1/2 NGS FFPE gdna Kit Handbook Part 2: Analysis

therascreen BRCA1/2 NGS FFPE gdna Kit Handbook Part 2: Analysis February 2017 therascreen BRCA1/2 NGS FFPE gdna Kit Handbook Part 2: Analysis Version 1 For the identification of variants in BRCA1 and BRCA2 For in vitro diagnostic use For use with Illumina MiSeqDx platform

More information

Detection of Fusion Genes by Targeted Roche 454 Sequencing

Detection of Fusion Genes by Targeted Roche 454 Sequencing Detection of Fusion Genes by Targeted Roche 454 Sequencing Hans-Ulrich Klein 1, Christoph Bartenhagen 1, Alexander Kohlmann 2, Vera Grossmann 2, Christian Ruckert 1, Torsten Haferlach 2, Martin Dugas 1

More information

Introduction to the UCSC genome browser

Introduction to the UCSC genome browser Introduction to the UCSC genome browser Dominik Beck NHMRC Peter Doherty and CINSW ECR Fellow, Senior Lecturer Lowy Cancer Research Centre, UNSW and Centre for Health Technology, UTS SYDNEY NSW AUSTRALIA

More information

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Structural variation Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Genetic variation How much genetic variation is there between individuals? What type of variants

More information

Tangram: a comprehensive toolbox for mobile element insertion detection

Tangram: a comprehensive toolbox for mobile element insertion detection Wu et al. BMC Genomics 2014, 15:795 METHODOLOGY ARTICLE Open Access Tangram: a comprehensive toolbox for mobile element insertion detection Jiantao Wu 1,Wan-PingLee 1, Alistair Ward 1, Jerilyn A Walker

More information

Supplementary Material for Extremely low-coverage whole genome sequencing in South Asians captures population genomics information

Supplementary Material for Extremely low-coverage whole genome sequencing in South Asians captures population genomics information Supplementary Material for Extremely low-coverage whole genome sequencing in South Asians captures population genomics information Navin Rustagi, Anbo Zhou, W. Scott Watkins, Erika Gedvilaite, Shuoguo

More information

Digital DNA/RNA sequencing enables highly accurate and sensitive biomarker detection and quantification

Digital DNA/RNA sequencing enables highly accurate and sensitive biomarker detection and quantification Digital DNA/RNA sequencing enables highly accurate and sensitive biomarker detection and quantification Erwin Chen ( 陳立德 ) Technical Product Specialist QIAGEN Taiwan Precision medicine: Right drug, right

More information

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales Targeted Sequencing Using Droplet-Based Microfluidics Keith Brown Director, Sales brownk@raindancetech.com Who we are: is a Provider of Microdroplet-based Solutions The Company s RainStorm TM Technology

More information

LUMPY: A probabilistic framework for structural variant discovery

LUMPY: A probabilistic framework for structural variant discovery LUMPY: A probabilistic framework for structural variant discovery Ryan M Layer 1, Aaron R Quinlan* 1,2,3,4 and Ira M Hall* 2,4 1 Department of Computer Science 2 Department of Biochemistry and Molecular

More information

SeqStudio Genetic Analyzer

SeqStudio Genetic Analyzer SeqStudio Genetic Analyzer Optimized for Sanger sequencing and fragment analysis Easy to use for all levels of experience From a leader in genetic analysis instrumentation, introducing the new Applied

More information

Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis

Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis Trevor Pugh, PhD, FACMG Princess Margaret Cancer Centre, University Health Network Dept.

More information

Lab methods: Exome / Genome. Ewart de Bruijn

Lab methods: Exome / Genome. Ewart de Bruijn Lab methods: Exome / Genome 27 06 2013 Ewart de Bruijn Library prep is only a small part of the complete DNA analysis workflow DNA isolation library prep enrichment flowchip prep sequencing bioinformatics

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

MedSavant: An open source platform for personal genome interpretation

MedSavant: An open source platform for personal genome interpretation MedSavant: An open source platform for personal genome interpretation Marc Fiume 1, James Vlasblom 2, Ron Ammar 3, Orion Buske 1, Eric Smith 1, Andrew Brook 1, Sergiu Dumitriu 2, Christian R. Marshall

More information

Variant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017

Variant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017 Variant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017 Up-front acknowledgments Many figures/slides come from: GATK Workshop slides: http://www.broadinstitute.org/gatk/guide/events?id=2038

More information

Introduc)on to NGS Variant Calling

Introduc)on to NGS Variant Calling Introduc)on to NGS Variant Calling Bioinforma)cs analysis and annota)on of variants in NGS data workshop Cape Town, 4 th to 6 th April 2016 Sumir Panji, Amel Ghouila, Gerrit Botha Types of variants Learning

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

Ion S5 and Ion S5 XL Systems

Ion S5 and Ion S5 XL Systems Ion S5 and Ion S5 XL Systems Targeted sequencing has never been simpler Explore the Ion S5 and Ion S5 XL Systems Adopting next-generation sequencing (NGS) in your lab is now simpler than ever The Ion S5

More information

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP) Application Note: RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP) Introduction: Innovations in DNA sequencing during the 21st century have revolutionized our ability to obtain nucleotide information

More information

Implementation of Ion AmpliSeq in molecular diagnostics

Implementation of Ion AmpliSeq in molecular diagnostics Implementation of Ion AmpliSeq in molecular diagnostics The Rotterdam Experience Ronald van Marion Deelnemersbijeenkomst SKML sectie Pathologie Amersfoort, 26 mei 2016 Molecular Diagnostics in Rotterdam

More information

Validation of Identity and Ancestry SNP Panels for the Ion PGM

Validation of Identity and Ancestry SNP Panels for the Ion PGM Validation of Identity and Ancestry SNP Panels for the Ion PGM Christopher Phillips, Carla Santos, Maria de la Puente, Manuel Fondevila, Ángel Carracedo, Maviky Lareu Forensic Genetics Unit, University

More information

Best practices for Variant Calling with Pacific Biosciences data

Best practices for Variant Calling with Pacific Biosciences data Best practices for Variant Calling with Pacific Biosciences data Mauricio Carneiro, Ph.D. Mark DePristo, Ph.D. Genome Sequence and Analysis Medical and Population Genetics carneiro@broadinstitute.org 1

More information

Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations

Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations INVESTIGATION Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations Matthew W. Hahn,*,,1 Simo V. Zhang, and Leonie C. Moyle* *Department of Biology and School of Informatics

More information

with drmid Dx for Illumina NGS systems

with drmid Dx for Illumina NGS systems Performance Characteristics BRCA MASTR Dx with drmid Dx for Illumina NGS systems Manufacturer Multiplicom N.V. Galileïlaan 18 2845 Niel Belgium Revision date: July 27, 2017 Page 1 of 8 Table of Contents

More information

Bioinformatics Advice on Experimental Design

Bioinformatics Advice on Experimental Design Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics

More information

Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next- Generation Sequencing

Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next- Generation Sequencing Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next- Generation Sequencing Mi-Hyun Park 1., Hwanseok Rhee 2., Jung Hoon Park 2, Hae-Mi Woo 1, Byung-Ok

More information

Targeted resequencing

Targeted resequencing Targeted resequencing Sarah Calvo, Ph.D. Computational Biologist Vamsi Mootha laboratory Snapshots of Genome Wide Analysis in Human Disease (MPG), 4/20/2010 Vamsi Mootha, PI How can I assess a small genomic

More information

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture The use of new sequencing technologies for genome analysis Chris Mattocks National Genetics Reference Laboratory (Wessex) NGRL (Wessex) 2008 Outline General principles of clonal sequencing Analysis principles

More information

Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows

Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows Genes 2012, 3, 545-575; doi:10.3390/genes3030545 Article OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline

More information