Jenny Gu, PhD Strategic Business Development Manager, PacBio

Similar documents
Revolutionize Genomics with SMRT Sequencing. Single Molecule, Real-Time Technology

The Why, What, and How of the Iso-Seq Method: Using Full-length RNA Sequencing to Annotate Genomes and Solve Diseases

Structural Variant Detection in SMRT Link 5 with pbsv

Structural Variant Detection in SMRT Link 5 with pbsv

SMRT Analysis Barcoding Overview

The Iso-Seq Method: Transcriptome Sequencing Using Long Reads

Detecting Structural Variants in PacBio Reads Tools and Applications

SMRT Analysis Barcoding Overview (v6.0.0)

Emerging applications of SMRT Sequencing

Juliet - One Click Minor Variant Calling

Procedure & Checklist - Multiplex Isoform Sequencing (Iso-Seq Analysis)

De Novo and Hybrid Assembly

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Understanding Accuracy in SMRT Sequencing

02 Agenda Item 03 Agenda Item

Template Preparation FIND MEANING IN COMPLEXITY. Copyright 2014 by Pacific Biosciences of California, Inc. All rights reserved.

Statistics Output Guide

Deep Sequencing technologies

Welcome to the NGS webinar series

Get to Know Your DNA. Every Single Fragment.

Wheat CAP Gene Expression with RNA-Seq

PacBio. The world s first single molecule, real-time DNA sequencer

Genomic resources. for non-model systems

NGS-based innovations within the Leiden Network

solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Next Generation Sequencing in Genetic Diagnostics Alan Pittman, PhD

Comprehensive Views of Genetic Diversity with Single Molecule, Real-Time (SMRT) Sequencing

Wet-lab Considerations for Illumina data analysis

Bioinformatics in SMRT Analysis 3.x

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

Procedure & Checklist - Using the BIO-RAD CHEF Mapper XA Pulsed Field Electrophoresis System

How to deal with your RNA-seq data?

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

TREE CODE PRODUCT BROCHURE

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Guidelines for Preparing 20 kb SMRTbell Templates

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

Procedure & Checklist - Preparing SMRTbell Libraries using PacBio Barcoded Universal Primers for Multiplex SMRT Sequencing

1.1 Post Run QC Analysis

Implementation and Evaluation of 10X Genomics Chromium technology

The Iso-Seq Method for Human Diseases and Genome Annotation

Maximizing your NGS sequencing with IDT. Adam Chernick, PhD Field Applications Manager, Functional Genomics

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq

ACCEL-NGS 2S DNA LIBRARY KITS

SureSelect XT HS. Target Enrichment

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

For Research Use Only. Not for use in diagnostic procedures.

Next-Generation Sequencing Services à la carte

Exploiting novel rice baseline datasets: WGS, BAC-based platinum genome sequencing and full-length transcriptomics

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

NEXT GENERATION SEQUENCING Whole Gene Sequencing

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Applied Biosystems SOLiD 3 Plus System. RNA Application Guide

Single-Cell Whole Transcriptome Profiling With the SOLiD. System

cdna CaptureSeq Reveals unappreciated diversity of the transcriptome

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

Surely Better Target Enrichment from Sample to Sequencer and Analysis

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Next-Generation Sequencing. Technologies

Analytics Behind Genomic Testing

Introduction to RNA-Seq in GeneSpring NGS Software

Agilent NGS Solutions : Addressing Today s Challenges

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

Isoform sequencing PacBio RSII. Anna Bratus PacBio User Meeting, Barcelona, November 10, 2015

New Frontiers of Genetic Profiling Achieve Higher Sensitivity and Greater Insights with Molecular Barcodes, Long Read Capture and Optimized Exomes

RNA-Seq data analysis course September 7-9, 2015

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Looking Ahead: Improving Workflows for SMRT Sequencing

RNA-SEQUENCING ANALYSIS

Unique, dual-matched adapters mitigate index hopping between NGS samples. Kristina Giorda, PhD

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Large insert library preparation

Introduction to human genomics and genome informatics

An Extreme Metabolism: Iso-Seq analysis of the Ruby-Throated Hummingbird

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

NextSeq 500 System WGS Solution

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

Whole Human Genome Sequencing Report This is a technical summary report for PG DNA

Next Gen Sequencing. Expansion of sequencing technology. Contents

SURESELECTXT LOW INPUT TARGET ENRICHMENT

Next Generation Sequencing. Tobias Österlund

Guidelines for Using the BIO-RAD CHEF Mapper XA Pulsed Field Electrophoresis

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Capturing Complex Human Genetic Variations using the GS FLX+ System

Introduction to Bioinformatics

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Understanding, Curating, and Analyzing your Diploid Genome Assembly

Targeted PacBio sequencing of wild zebrafish immune gene families. Jaanus Suurväli University of Cologne Institute for Genetics

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Gene Expression Technology

Cancer Genetics Solutions

NGS in Pathology Webinar

Bacterial Iso-Seq Transcript Sequencing Using the SMARTer PCR cdna Synthesis Kit and BluePippin Size-Selection System

Surely Better Target Enrichment from Sample to Sequencer

Introduction into single-cell RNA-seq. Kersti Jääger 19/02/2014

Transcription:

IDT and PacBio joint presentation Characterizing Alzheimer s Disease candidate genes and transcripts with targeted, long-read, single-molecule sequencing Jenny Gu, PhD Strategic Business Development Manager, PacBio 1

Characterizing Alzheimer s Disease candidate genes and transcripts with targeted, long-read, single-molecule sequencing September 27, 2017 / Jenny Gu, Ph.D. For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved.

AGENDA -SMRT Sequencing technology overview -Recommended IDT capture workflow for SMRT Sequencing -Case Study: Alzheimer s Disease panel

ALZHEIMER S DISEASE (AD) Alzheimer s disease is the most common form of neurodegenerative dementia. 46.8M 131.5M Clinical characterization: Progressive loss of memory and deficits in thinking, problem solving, and language https://www.alz.co.uk/research/worldalzheimerreport2015.pdf Neuropathological characterization: Progressive cortical atrophy due to neuronal loss and characteristic intracellular and extracellular deposits of insoluble tau and amyloid β proteins http://www.reverseagingcentre.com/media/links/signs-ofalzheimers/ 4

ALZHEIMER S DISEASE (AD) The complex genetic makeup of AD -Genetically divided into two different groups: early-onset and late-onset -Relative risk for first degree relatives is 3.5 7.5-30 48% of AD patients have an affected first-degree relative Early-onset AD: - For 2 10% of patients first symptoms occur in their 20s or 30s. - Four genes account for 5 10% of early onset AD: -APP PSEN1 PSEN2 APOE Late-onset AD: - Manifests after 65 years - Multifactorial with strong genetic predisposition - GWAS have identified 20+ genetic risk loci with small Odds Ratios (1.1 2.0 per risk allele) including both common functional variants and rare and structural variants 5

CANDIDATE DISEASE GENES IN ALZHEIMER S DISEASE (AD) Several decade long search for risk genes in Alzheimer s disease Many associated genetic loci contain several genes Which candidates involved in disease risk remains unclear (20+ genes) Strategies for assessing GWAS candidate genes: -DNA sequencing -Transcriptome sequencing -Proteome studies -Methylome studies Cuyvers E. et al. (2016) Genetic variations underlying Alzheimer's disease: evidence from genome-wide association studies and beyond. Lancet Neurol. 15(8),857-68. 6

SEQUEL SYSTEM Typical Performance -Average read length: 10 18 kb -Consensus accuracy: Achieves QV50 -Throughput per cell: 5 8 Gb -SMRT Cells per run: 1 16 -Movie lengths: 30 minutes 10 hours 7

TYPICAL DATA Read lengths >20 kb Data per SMRT Cell: 5 8 Gb Reads (#) Half of data in reads >20 kb Top 5% of reads >35 kb Maximum read lengths >60 kb Read length (bp) Read length data shown from 30 kb size-selected human library on the Sequel System (10-hour movie, 2.0 chemistry) with a total output of 7.6 Gb. Each Sequel System SMRT Cell 1M generates ~365,000 reads. 8

BENEFITS OF LONG-READ SEQUENCING FOR CHARACTERIZING GENOMIC STRUCTURAL VARIATION Structural variation (SV) is an important contributor to human diversity and disease Example SV Types and Mechanisms SV is also difficult to characterize Targeted SMRT Sequencing allows scientists to directly characterize: Complete Genes (introns & exons) Phased Variants (allelic haplotypes) Repetitive Regions Regulatory Regions (upstream/downstream) Insertions & Deletions Copy Number Variations At high coverage for specific genes or regions of interest across multiple samples. Mechanisms underlying structural variant formation in genomic disorders. Carvalho CM et al. Nat Rev Genet. (2016) 9

GENETIC VARIATION SEQUENCING WITH SMRT SEQUENCING VARIANT TYPE SNPs Small Indels Phasing STRs & VNTRs Mobile Elements Large Insertions, Deletions One PacBio Read Spans Most Variants Indels Phased Alleles Repeat Expansions L1, Alu, SVA Copy Number Variation Structural Variants Phasing (SNVs and SVs) Haplotype Reconstruction Complex Variants Phasing SVs and SNVs Medium to Large SV s Inversions / Translocations Haplotypes Large Structural Rearrangement Assembled PacBio Reads Span Euchromatic Genome Variation 1 10 100 1 kb 10 kb 100 kb 1 Mb 10 Mb 100 Mb Size of Variant 10

ADDITIONALLY CHARACTERIZE TRANSCRIPTOME SPLICE VARIATION WITH LONG-READ SEQUENCING - Proteins and their functions are not only impacted by variants in exonic regions - Variants in regulatory regions (enhancers/promoters, including methylation) and intronic regions can also play an important role - High transcript isoform diversity from alternative splicing - Obtain full-length transcript sequences with Iso-Seq analysis National Human Genome Research Institute. Bioinformatics: Finding genes. (2013) http://www.genome.gov/25020001 11

TRACE VARIANTS TO SPECIFIC ALLELES WITH PHASED HETEROZYGOUS SNPS 12

CASE STUDY: VARIANT SCREENING IN ALZHEIMER S DISEASE WITH LONG-READ SEQUENCING -Genomic and transcriptomic (cdna) capture experiment -Combined data provide better insight on variant-affected gene expression -Gene panel applied to two AD patients (35 candidate genes): Average gdna fragment size: ~6 kb Full-length transcripts ranging from <1 kb ~10 kb 13

PACBIO TARGETED PROBE-BASED CAPTURE WORKFLOW (GENOMIC DNA CAPTURE) Ligate EXPERIMENTAL PIPELINE barcoded adapters Probe hybridization, Genomic DNA Shear to 7 kb Amplification Size selection bead capture, wash (6 kb for multiplex) 1 2 3 4 5 5-9 kb 5-9 kb 8 Analysis 7 Sequencing 6 Amplification and SMRTbell prep. + Size selection INFORMATICS PIPELINE 9 10 11 12 13 Map reads of Phased allelic Phasing with Bin reads by insert to consensus SAMtools haplotype Reference sequence Tertiary analysis 14

BEST PRACTICE SUMMARY: GENOMIC CAPTURE -Save on project costs by multiplexing and spacing probes up to 1 kb. -Multiplex up to 12 samples. -Use PacBio linear barcoded adapters. -High molecular weight DNA required. -Size-selection highly recommended to max. on long-read recovery. -Aim for 100-fold coverage of targeted panel size (full-length gene coverage). 15

AD SAMPLES: SHEARED GDNA QC Recommend starting with HMW gdna (2 µg) 10 kb shear 16

SMRTBELL LIBRARY QC (SIZE-SELECTED) Final library size selected 17

GRCH38 SUBREAD MAPPING RESULTS Skeletal muscle Brain 7.4 GB 2.2 M reads 8.4 GB 2.5 M reads 18

PACBIO TARGETED PROBE-BASED CAPTURE WORKFLOW (TRANSCRIPTOME WITH SIZE SELECTION) EXPERIMENTAL PIPELINE cdna library Size selection Probe hybridization, mrna Amplification bead capture, wash + barcodes (optional) 1 2 3 4 5 5-9 kb 8 Analysis 7 Sequencing 6 Amplification and SMRTbell prep. INFORMATICS PIPELINE 9 10 Iso-Seq analysis Tertiary analysis 19

BEST PRACTICE SUMMARY: CDNA CAPTURE -Recover high-quality RNA transcripts -Size-selection is optional, but helpful for specific fractions. -Targeted capture Iso-Seq analysis is recommended to characterize splice isoforms -Not recommended for characterizing gene expression levels -Aim for min. 30-fold per anticipated splice isoform in samples -Probes can be designed to exons only and/or including introns 20

AD SAMPLES: MRNA QC Temporal lobe 1 RNA RIN = 8.0 Recommend RIN > 6 (RNA Integrity Number) Temporal lobe 2 RNA RIN = 8.1 21

EXAMPLE WHOLE TRANSCRIPTOME SMRTBELL LIBRARY (CDNA) 22

DESIGNING CUSTOM IDT XGEN LOCKDOWN CAPTURE PANEL -Key benefit of xgen Lockdown Probes is flexibility in design -Do not need to redesign existing probe panels -However, recommend full-gene design by including introns and exons, plus extra upstream and downstream sequences -Probes can be spaced up to 1000 bp apart -Use the same probes for genomic and cdna capture FULL-GENE DESIGN Gene A Gene B 23

SNPs AND LARGER SVs DISCOVERED IN AD SAMPLES STUDY RESULTS: Detected broad range of genomic variants (SNPs and SVs): -31 unique SVs ranging from 65 bp to several kb in size 500+ Isoforms found in each patient -Patient 1: 515 isoforms -Patient 2: 507 isoforms 67 3 2 39 154 312 88% novel splice isoforms identified -Only 39 isoform shared among both patients and those reported in Gencode v25 319 24

RIN3 GENE: ~50 bp INSERTION DETECTED 25

ZCWPW1 GENE: ~750 bp DELETION DETECTED IN BOTH PATIENTS Patient 1 Patient 2 26

BACE1 GENE: PHASED ALLELES (34 KB) Heterozygous SNPs can be used to phase alleles across multi-kilobase regions Phase 0 Phase 1 Gene Probes Target Phased SNPs 27

BIN1 GENE: PHASED ALLELES (63 KB) Heterozygous SNPs can be used to phase alleles across multi-kilobase regions Phase 0 Phase 1 Gene Probes Target Phased SNPs 28

MAPT GENE RESULTS FOR PATIENT 1 Heterozygous genomic variants can be linked to corresponding expressed transcripts 21 isoforms MAPT gene results: -Detected a heterozygous deletion -One allele is transcribed into 21 isoforms and the other only into 5 -Detected a novel exon and transcript 5 isoforms 29

ZCWPW1 GENE: RETAINED INTRONS AND NEW EXONS Novel exon Retained intron Patient 1 Patient 2 30

CONCLUSION -AD has a large economic impact on the global society (2010: $604B) -To date, over 20+ putative genetic risk variants have been mapped -Associated SNPs are usually not the true causative variant -Combining gdna and cdna data is more informative -Custom IDT xgen Lockdown Panels allow flexibility to scale projects -SMRT sequencing provides multi-kilobase phased alleles and fulllength transcripts http://www.mvcenters.com/2015/02/11/dementiatakes-toll-claims-another-american-great-dean-smith/ Structural variants can be more informative for disease diagnostics, prognostics and translation than current SNP mapping and exon sequencing. Roses A.D. et al. (2016) Structural variants can be more informative for disease diagnostics, prognostics and translation than current SNP mapping and exon sequencing. Expert Opin Drug Metab Toxicol. 12(2),135-47. 31

ACKNOWLEDGEMENT Kevin Eng Ting Hon Elizabeth Tseng Aaron Wenger William Rowell Jenny Ekholm Steve Kujawa Kristina Giorda Jiashi Wang Mirna Jarosz Visit PacBio Blog for new announcements and updates on Targeted Sequencing! http://www.pacb.com/blog http://www.pacb.com/applications/targeted-sequencing/ Feel free to contact! Jenny Gu (jgu@pacb.com)

www.pacb.com For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. xgen and Lockdown are trademarks of Integrated DNA Technologies, Inc. All other trademarks are the sole property of their respective owners.

gdna Capture Supplemental Information

PACBIO POLYMERASE READS Skeletal muscle Brain 35

SMRT LINK PROVIDES BASIC PROCESSING OF RAW DATA FOR TARGETED CAPTURE ENRICHMENT STUDIES SMRT Analysis produces: -Filtered subreads -Circular consensus sequences -Alignment to reference (BAM files) -Iso-Seq full-length transcripts 36

BIOINFORMATICS WORKFLOW FOR PHASING ALLELES IGV 3.0 Visualize 1 2 3a 4 5 Raw data SMRTLink CCS reads SMRTLink Aligned BAM file 3b Subreads 6 Probe *.bed 7 capture2target.py 8 Defined phase blocks 11 cmdline: PacBio arrow 10 Phased alleles/region Subset and phase 9 samtools Polish Data 12 Phased consensus sequences (*.fasta) >99.9% accuracy (dependent on coverage) SMRTLink Command line tools Third party software Github: Targeted phasing consensus (genomic capture) 37