Targeted resequencing

Similar documents
SUPPLEMENTARY INFORMATION

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

SEQUENCING. M Ataei, PhD. Feb 2016

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

Variant prioritization in NGS studies: Annotation and Filtering "

High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency

High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Using VarSeq to Improve Variant Analysis Research

Whole genome sequencing in the UK Biobank

Prioritization: from vcf to finding the causative gene

The Agilent Technologies SureSelect Platform for Target Enrichment

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD

Applied Bioinformatics

Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease.

Novel Variant Discovery Tutorial

Automating the ACMG Guidelines with VSClinical. Gabe Rudy VP of Product & Engineering

Structure, Measurement & Analysis of Genetic Variation

Identification of the Photoreceptor Transcriptional Co-Repressor SAMD11 as Novel Cause of. Autosomal Recessive Retinitis Pigmentosa

Genome-wide genetic screening with chemically-mutagenized haploid embryonic stem cells

UKPMC Funders Group Author Manuscript Nature. Author manuscript; available in PMC 2011 April 1.

Understanding the science and technology of whole genome sequencing

Alissa Interpret The next evolution of Cartagenia Bench

Jumping Into Your Gene Pool: Understanding Genetic Test Results

Target Enrichment Strategies for Next Generation Sequencing

Welcome to the NGS webinar series

Characterization of novel rare genetic variants identified by next generation sequencing

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genetic Analysis Platform. Wendy Winckler, Ph.D. October 7, 2010

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Mutation entries in SMA databases Guidelines for national curators

SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System

Amapofhumangenomevariationfrom population-scale sequencing

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Genetics 101: What does it mean to have an LBSL-specific mutation

Niemann-Pick Type C Disease Gene Variation Database ( )

Genomic resources. for non-model systems

Whole Genome Sequencing. Biostatistics 666

Next-Generation Sequencing. Technologies

The Diploid Genome Sequence of an Individual Human

Next Genera*on Sequencing II: Personal Genomics. Jim Noonan Department of Gene*cs

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Get to Know Your DNA. Every Single Fragment.

Infinium Global Screening Array-24 v1.0 A powerful, high-quality, economical array for population-scale genetic studies.

Training materials.

Strand NGS Variant Caller

Genome annotation & EST

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

G E N OM I C S S E RV I C ES

Molecular Genetics of Disease and the Human Genome Project

INTERDISCIPLINARY COLLABORATIONS MEDICAL RECORDS AND GENOMICS (EMERGE) NETWORK AND EHRS: THE ELECTRONIC. October 30, 2015

INTRODUCTION TO MOLECULAR GENETICS. Andrew McQuillin Molecular Psychiatry Laboratory UCL Division of Psychiatry 22 Sept 2017

SNPs - GWAS - eqtls. Sebastian Schmeier

Custom Panels via Clinical Exomes

Computational Workflows for Genome-Wide Association Study: I

Long-range gene regulation

Bioinformatics Advice on Experimental Design

Genomics: Human variation

New Frontiers of Genetic Profiling Achieve Higher Sensitivity and Greater Insights with Molecular Barcodes, Long Read Capture and Optimized Exomes

ACGS: Standardisation of variant interpretation and reports

Mapping and Mapping Populations

MedSavant: An open source platform for personal genome interpretation

UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009

Genomic solutions for complex disease

Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

ANNOVAR Variant Annotation and Interpretation

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

Agilent NGS Solutions : Addressing Today s Challenges

Lecture 2: Biology Basics Continued. Fall 2018 August 23, 2018

Performance of the Newly Developed Non-Invasive Prenatal Multi- Gene Sequencing Screen

About Strand NGS. Strand Genomics, Inc All rights reserved.

A Crash Course in NGS for GI Pathologists. Sandra O Toole

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Case 3. Laura Yeates BSc (hons) Grad Dip Gen Couns. FHGSA

Analytics Behind Genomic Testing

Whole genome sequencing in drug discovery research: a one fits all solution?

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Surely Better Target Enrichment from Sample to Sequencer

Roadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications

The Agilent Technologies SureSelect Platform for Target Enrichment

The Nature of Genes. The Nature of Genes. The Nature of Genes. The Nature of Genes. The Nature of Genes. The Genetic Code. Genes and How They Work

Supplementary Figures and Data

Gene mutation and DNA polymorphism

Emerging applications of SMRT Sequencing

Course Overview: Mutation Detection Using Massively Parallel Sequencing

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

PERSPECTIVES. A gene-centric approach to genome-wide association studies

A DE NOVO NONSENSE MUTATION IN MAGEL2 IN A PATIENT INITIALLY DIAGNOSED AS OPITZ-C: SIMILARITIES BETWEEN SCHAAF-YANG AND

Gathering of pathogenicity evidence for novel variants. By Lewis Pang

The Revolution in Human Genetics: Deciphering Complexity. David Galas Institute for Systems Biology, Seattle, WA

For more information about how to cite these materials visit

Informed Consent for Columbia Combined Genetic Panel (CCGP) for Adults Please read the following

Transcription:

Targeted resequencing Sarah Calvo, Ph.D. Computational Biologist Vamsi Mootha laboratory Snapshots of Genome Wide Analysis in Human Disease (MPG), 4/20/2010 Vamsi Mootha, PI

How can I assess a small genomic target across many samples? Limitations of whole exome sequencing expensive (apple 1 Illumina lane/sample) coding only (~20% of all functional elements) Multi-Species Conserved Seqs Margulies et al. 2003

How can I assess a small genomic target across many samples? Limitations of whole exome sequencing expensive (apple 1 Illumina lane/sample) coding only (~20% of all functional elements) Goal: sequence arbitrary small genomic regions, across many samples genomic regions, eg GWAS non-coding genic, eg UTRs, regulatory, conserved seqs small number of candidate genes (100s)

Outline Snapshot of Mito10k (100 candidate genes x 100 patients) Lessons to share

Complex I of the mitochondrial respiratory chain

Complex I deficiency Structure 46 subunits, 10 known assembly factors Complex I Deficiency Most common OXPHOS disease Phenotypes range from neonatal lethal, to adult-onset neurodegeneration leukodystrophy, cardiomyopathy, seizures, Leigh syndrome 26 disease-related genes (19 subunits + 7 assembly factors) Diagnosis difficult No effective treatment Goal: identify molecular basis for complex I deficiency Method: sequence candidate genes in patient cohort

New candidates for complex I deficiency Catalog of 1100 mitochondrial genes (Pagliarini, Calvo et al. Cell 2008) Phylogenetic profiling 45 subunits complex I

New candidates for complex I deficiency Catalog of 1100 mitochondrial genes (Pagliarini, Calvo et al. Cell 2008) Phylogenetic profiling 103 candidate genes 22 mtdna 81 nuclear

Mito10K: sequence 100 genes X 100 patients 103 Samples 60 patients 43 patient controls 42 healthy controls X 103 Genes 81 nuclear 22 mtdna (145Kb target) Discovery Screen Pool DNA (~20 per pool) PCR amplify targets Illumina sequence (3400X) Identify SNVs / indels Genotyping Phase Sequenom SNVs Sanger indels Experimental Follow-up Rescue RCCI defect in patient fibroblasts

Mito10K: sequence 100 genes X 100 patients 103 Samples 60 patients 43 patient controls 42 healthy controls X 103 Genes 81 nuclear 22 mtdna (145Kb target) Discovery Screen Pool DNA (~20 per pool) PCR amplify targets Illumina sequence (3400X) Identify SNVs / indels Genotyping Phase Sequenom SNVs Sanger indels Experimental Follow-up Rescue RCCI defect in patient fibroblasts Challenge: detect singleton SNVs, given high error rates Method: Syzygy 92% sensitivity (76% singletons) 9% false positives ~300 SNVs detected per pool Manuel Rivas

Can we identify likely deleterious variants?

Can we find causal mutations in patient cohort? Expect: known pathogenic mtdna variants recessive-type variants in known nuclear genes 60 patients with complex I deficiency

How can we establish pathogenicity of novel genes? Patient fibroblasts show complex I defect

Can we find causal mutations in patient cohort? 60 patients with complex I deficiency

Can we find causal mutations in patient cohort? Diagnosis of 22% of patient cohort 2 novel disease-related genes 5 novel mutations in known disease genes What about rest? - non-targeted gene? - non-coding region? - poor coverage? - less likely deleterious? - dominant? - synergistic variants? 60 patients with complex I deficiency

Mito10K summary Mito10k: targeted resequencing of ~100 candidate genes x 100 patients cost-effective diagnosis in 22% of undiagnosed patients 2 novel disease-related genes Key: availability of cellular models Available methods/tools Syzygy (http://www.broadinstitute.org/ftp/pub/mpg/syzygy/syzygy.zip) Definition of likely deleterious (Appendix A in this presentation) IGV (http://www.broadinstitute.org/igv/)

Outline Snapshot of Mito10k (100 candidate genes x 100 patients) Lessons to share 1. sample preparation 2. validation of rare variants 3. cost 4. phasing

Sample preparation: what if I don t have enough DNA? Issue: need 3ug DNA for sequencing Method: whole genome amplification (WGA) pros: works well (straight-forward, inexpensive) cons: possible mutations unequal amplification of regions unequal amplification of alleles rolling PCR artifact (if DNA slightly degraded) Observations: Pooling: unequal amplification mtdna pools with 96% mtdna from 1 person Single-sample: rolling PCR exons with excessive coverage, false SNP/indel calls Conclusions: WGA works, however avoid if possible

How can I validate rare variants? Method: genotype via Sequenom or BeadXPress Assess Sequenom accuracy: Sanger validation of 82 DNA sites Accurate genotype (88%) False positive (10%) Hom/Het Miscall (2%) Caveats (not necessary Sequenom error): WGA (mutation or unequal amplification of alleles) Sequenom design params (# samples iplexed) Sample mixups Conclusions: care with design of rare variant validation

How should I minimize cost in project design? Observation from Mito10K pilot: 7 Illumina lanes (~$10K) 40,000 Sequenom genotypes (~20K) Months data analysis Conclusions: to minimize analysis costs: design fewer data tranches, larger # samples

How can I phase rare heterozygous variants? Issue: If interested in mutations consistent with recessive inheritance, need to determine if 2 observed rare hets are: Method: a) in-phase (same parental chromosome), or b) compound heterozygous (different parental chromosomes) common variants: computational methods (eg imputation) rare variants: genotype familial DNA or experimental method to phase variants Conclusions: obtain parental DNA during sample collection OR consider experimental methods to phase

Acknowledgements Mootha Laboratory Vamsi Mootha Olga Goldberger Genome Analysis Toolkit Team Mark DePristo Eric Banks Andrey Sivachenko Genome Analysis Platform Stacey Gabriel Noel Burtt Candace Guiducci Gabe Crawford Rob Onofrio Michelle Redman Entire Genetic Analysis platform Pooled Sequencing Analysis Team Manuel Rivas Jason Flannick Mark Daly James Pirruccello Moran Cabili Team PooledSeq Biological Samples Platform Scott Mahan Kristin Ardlie Susan Flynn Entire BSP platform Murdoch Children s Research Institute David Thorburn Elena Tucker Alison Compton Sequencing Platform Jen Baldwin Jane Wilkinson Lauren Ambrogio Entire Seq platform Patients! Funding: NIGMS, NHGRI

Appendix A: definition of likely deleterious variants For the Mito10K project, we used the following algorithm to identify variants most likely to underlie a rare, devastating phenotype 1. Include all disease-associated variants (Human Gene Mutation Database & curation) 2. Next, exclude variants present in healthy individuals, based on 1. dbsnp129 2. 1000 genomes project, pilot 1 3. 42 HapMap controls also sequenced 4. mtdna variants present in mtdb with minor allele frequency > 0.005 3. Next, include: 1. coding indel 2. nonsense variant 3. missense variant at an amino acid conserved in 10 aligned vertebrate species, based on the multiz44way genome alignments (UCSC), or predicted as damaging by PolyPhen-2.0 (HumVar training data) 4. splice-site (splice acceptor sites -1,-2,-3, and splice donor sites -1,1,2,3,5 selected based on training data consisting of all 8189 HGMD disease-associated splice variants) 5. trna variants in mitochondrial genome 6. 5 UTR variants which create or disrupt an AUG alternate start codon

Genetic diagnosis of CI cohort (94 unrelated individuals)