Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle

Similar documents
Structural variation analysis using NGS sequencing

SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System

Mate-pair library data improves genome assembly

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

Analysis of structural variation. Alistair Ward USTAR Center for Genetic Discovery University of Utah

Library construction for nextgeneration sequencing: Overviews. and challenges

Detection of Rare Variants in Degraded FFPE Samples Using the HaloPlex Target Enrichment System

Get to Know Your DNA. Every Single Fragment.

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

Next-generation sequencing technologies

02 Agenda Item 03 Agenda Item

Analysis of structural variation. Alistair Ward - Boston College

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Impact of gdna Integrity on the Outcome of DNA Methylation Studies

T G T A. artificial chimera

High-yield, Scalable Library Preparation with the NEBNext Ultra II FS DNA Library Prep Kit

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Frequently asked questions

Supplementary Figures

10 RXN 50 RXN 500 RXN

Ramp1 EPD0843_4_B11. EUCOMM/KOMP-CSD Knockout-First Genotyping

Usp14 EPD0582_2_G09. EUCOMM/KOMP-CSD Knockout-First Genotyping

Biotool DNA library prep kit V2 for Illumina

Supplementary Information

NxSeq Long Mate Pair Library Kit

Preparing normalized cdna libraries for transcriptome sequencing (Illumina HiSeq)

NEBNext Direct Custom Ready Panels

Wet-lab Considerations for Illumina data analysis

Somatic Primary pirna Biogenesis Driven by cis-acting RNA Elements and Trans-Acting Yb

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Capturing Complex Human Genetic Variations using the GS FLX+ System

BIOO LIFE SCIENCE PRODUCTS

Supplemental Information. Autoregulatory Feedback Controls. Sequential Action of cis-regulatory Modules. at the brinker Locus

Welcome to the NGS webinar series

Prepare a Barcoded Fragment Library with the SOLiD Fragment Library Barcoding Kit 1 96

CREST maps somatic structural variation in cancer genomes with base-pair resolution

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

G E N OM I C S S E RV I C ES

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

NEXTflex Cystic Fibrosis Amplicon Panel. (For Illumina Platforms) Catalog # (Kit contains 8 reactions) Bioo Scientific Corp V17.

A more efficient, sensitive and robust method of chromatin immunoprecipitation (ChIP)

Authors: Vivek Sharma and Ram Kunwar

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Whole Genome Amplification (WGA): What to Do When You Don t Have Enough Genomic DNA

SureSelect XT HS. Target Enrichment

Sample Quality Control in Agilent NGS Solutions

Data Sheet Quick PCR Cloning Kit

Genome editing. Knock-ins

Supplementary Information for:

NRAS Codon 61 Mutation Analysis Reagents

Supporting Information

Supplementary Figure 1. Nature Structural & Molecular Biology: doi: /nsmb.3494

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

Complete protocol in 110 minutes Enzymatic fragmentation without sonication One-step fragmentation/tagging to save time

Nature Methods Optimal enzymes for amplifying sequencing libraries

Design. Construction. Characterization

Research techniques in genetics. Medical genetics, 2017.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Supplementary Figure 1.

Lieberman-Aiden et al. (2009) Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326:

HALOPLEX PCR TARGET ENRICHMENT & LIBRARY PREPARATION PROTOCOL. Version 1.0.3, April 2011 For research use only

scgem Workflow Experimental Design Single cell DNA methylation primer design

KAPA hgdna QUANTIFICATION AND QC KIT:

NUCLEIC ACID PURIFICATION KITS FAST SIMPLE QUALITY CONVENIENT REPRODUCIBLE

CAPTURE-BASED APPROACH FOR COMPREHENSIVE DETECTION OF IMPORTANT ALTERATIONS

STANDARD CLONING PROCEDURES. Shotgun cloning (using a plasmid vector and E coli as a host).

Supporting Information

NEXTflex Myeloid Amplicon Panel (For Illumina Platforms) Catalog #NOVA (Kit contains 8 reactions) Bioo Scientific Corp V17.

Purpose: To sequence 16S microbial DNA isolated from animal fecal matter and perform compositional analysis of bacterial communities.

Introductory Next Gen Workshop

d. reading a DNA strand and making a complementary messenger RNA

BIOO LIFE SCIENCE PRODUCTS

Deep Sequencing technologies

Applications and Sample Preparation for SOLiD Runs

Fast and efficient site-directed mutagenesis with Platinum SuperFi DNA Polymerase

Supporting Information

Fusion Gene Analysis. ncounter Elements. Molecules That Count WHITE PAPER. v1.0 OCTOBER 2014 NanoString Technologies, Inc.

SUPPLEMENTARY INFORMATION

SALSA MLPA probemix P453-A1 GAA Lot A

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

ACCEL-NGS 2S DNA LIBRARY KITS

Cold Fusion Cloning Kit. Cat. #s MC100A-1, MC101A-1. User Manual

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.

TECH NOTE Ligation-Free ChIP-Seq Library Preparation

Lab methods: Exome / Genome. Ewart de Bruijn

MagMAX FFPE DNA/RNA Ultra Kit

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

FFPE DNA Extraction Protocol

Overview of Next Generation Sequencing technologies. Céline Keime

Illumina Sequencing Sample Preparation for use with CRISPRi/a-v2 Libraries

Ion TrueMate Library Preparation

NEBNext. for Ion Torrent LIBRARY PREPARATION KITS

GenBuilder TM Plus Cloning Kit User Manual

GenBuilder TM Plus Cloning Kit User Manual

Enzymatic assembly of DNA molecules up to several hundred kilobases

Nature Genetics: doi: /ng Supplementary Figure 1

Amplified segment of DNA can be purified from bacteria in sufficient quantity and quality for :

Supplementary Figure 1 Strategy for parallel detection of DHSs and adjacent nucleosomes

Transcription:

Figure S1 a Unrearranged locus Rearranged locus Concordant read pairs Region1 Concordant read pairs Cluster of discordant read pairs, bundle Region2 Concordant read pairs b Physical coverage 5 4 3 2 1 Sequence coverage 3 2 1 0 Genomic region Paired-end reads c arge-insert mate-pairs (3000-4000 bps) arge insert footprints (2000-5000 bps) Region1 reakpoint Region2 Conventional paired-end reads (300-s) Conventional read footprints (500-800 bps)

Figure S2 Sheared genomic DN (3-4 Kb) igation of paired-end adapters PCR amplification DN isolation igation of CP adapters Gel size-selection (3-4 kb) Magnet 400bp 300bp 200bp 4kb 3kb 2kb Selective DN digest (T7 exonuclease and S1 nuclease) iotin selection Paired-end sequencing (2x) 100 nt 100 nt Circularization nick iotin nick Internal adapter Nick translation (E. coli DN Pol I) 3 5 3 5 Stripping adapter sequence Mapping to the reference 3.5 Kb

Figure S3 a donor acceptor reakpoint coordinates = (chrom, coord, dir ) = (chrom, coord, dir ) e Inversion C D reakpoint (->) Rearranged region Discordant PE-reads from the rearranged region b Deletion f Non-reciprocal translocation c Small insertion g Reciprocal translocation D C d Tandem duplication h Duplicative translocation C Deletion dir() = + dir() = + coord() < coord() D Small insertion dir() = + dir() = + coord() = coord() Tandem duplication dir() = + dir() = + coord() < coord() Inversion dir() = + dir() = - coord() < coord() dir(c) = - dir(d) = + coord(c) < coord(d) coord() = coord(c) coord() = coord(d) Reciprocal translocation dir() = dir(d) dir(c) = dir() coord() = coord(d) coord(c) = coord() Duplicative translocation coord() = coord(d) dir() = dir(d) dir() = dir(c)

SV1 SV2 SV3 SV4 SV5 SV6 SV7 SV8 SV9 SV10 adder adder adder SV_num reakpoint_id STT (fresh frozen NPC5989) STT (FFPE NPC5989) Tandem duplication SV1 _8_128-8_128 Coupled inversion SV2 _1_198-1_203 SV3 _1_197-1_198 SV8 _1_197-1_203 Duplicative translocation SV4 _1_154-8_128 Coupled inversion SV5 _11_95-11_102 + + SV6 _11_101-11_102 + + SV7 _11_95-11_101 + + Deletion SV9 _2_66-2_66 + + Deletion SV10 _1_155-1_155

Figure S5 SV_num reakpoint_id Mate-pairs TC (from MPs) TC (from gel) TC (from qpcr) TC (from cnv) Tandem duplication SV1 _8_128-8_128 212 72% 92%, 72% 33%, 37% Coupled inversion SV2 SV3 SV8 _1_198-1_203 _1_197-1_198 _1_197-1_203 216 199 75 56% 88%, 88% 91% 81%, 49% 21%, 56% 104%, 81% 80%, 39% Duplicative translocation SV4 _1_154-8_128 204 70% 34% 42% 56%, 64% Coupled inversion SV5 SV6 SV7 _11_95-11_102 _11_101-11_102 _11_95-11_101 155 157 141 52% 71%, 81% 34%, 77% 81%, 44% 50%, 25% 104%, 44% 32%, 65% Deletion SV9 _2_66-2_66 184 63% 137%, 141% 66%, 54% Deletion SV10 _1_155-1_155 145 50% 58% 47%

Figure S6 a Control primer p primer2 reakpoint p primer1 b SV1 T Control primer R SV2 SV3 SV5 SV4 N W SV1 SV2 SV3 SV5 SV4 N W T SV7 SV8 SV6 N W T SV10 SV9 SV6 SV7 SV8 SV10 SV9 N W T

SUPPEMENT FIGURE EGENDS Supplemental Figure 1 (a) Examples of structural variants and discordant read pairs. Unrearranged locus (left part of the figure) has only concordant read pairs, which map within the expected distance of each other. This distance is set by the library insert size distribution. The rearranged locus (right part of the figure) generates both concordant and discordant read pairs. Inserts spanning the breakpoint (position where blue and green genomic regions are fused) produce discordant read pairs. Other inserts from the adjacent regions (blue and green) are concordant since they do not cross the breakpoint. Provided sufficient genome coverage, breakpoints are represented by multiple independent read pairs originating from DN inserts that span that breakpoint. Multiple discordant read pairs representing a single breakpoint are termed read bundle. (b) Relationship between sequence coverage and physical coverage. genomic region (depicted as a blue bar) is sampled by multiple inserts represented by read pairs that map to that region (depicted as narrow blue bars linked by the dashed purple lines, see the bottom of the panel). Sequence coverage (blue curve) represents the number of reads (sequenced portion of the insert, narrow blue bars) that span any given base pair. Physical coverage (blue curve at top of panel) represents the number of inserts (reads as well as unsequenced part of the insert depicted by the dashed purple line) that cover any given base pair. The mathematical formulas in the right part of the panel provide the dependence between the average genome- wide physical coverage and average sequence coverage. The two examples of the physical coverage calculation (75 and 525) provide an illustration of how physical coverage increases when using larger inserts, even when sequence coverage is unchanged. (c) Illustration of differences associated with using short insert fragment libraries and long insert mate pair libraries. Not only is the physical coverage less with shorter inserts, but the breakpoint- associated read pairs have different distributions as well. In particular, reads from short inserts occupy narrow regions (referred to as footprints ) next to the breakpoint. Reads from the large- insert library have a significantly bigger footprint. Supplemental Figure 2. arge insert mate- pair library construction. Genomic DN is first sheared to the desired size range (3-4 Kb), followed by ligation of CP adapters that lack the 5 phosphate group at the end of the short oligo. DN fragments are then circularized via ligation of internal biotin- labeled adaptor. Circular constructs contain single- stranded nicks at the sites of ligation due to missing 5 phosphate group. Nicks are enzymatically moved in 5 to 3 direction inside the DN insert. T7 exonuclease recognizes the nicks and digests the nicked DN in the 5 to 3 direction. The exposed ssdn is then digested using S1 nuclease. Undigested biotin- labeled DN is isolated using magnetic streptavidin beads. Illumina sequencing adapters are ligated onto the ends of the purified fragments on beads and the library DN is enriched by PCR and sequenced using the paired- end illumina HiSeq protocol (2 101 bp). Sequencing reads are analyzed to strip the internal adapter sequence and the resulting sequencing fragments are mapped to the human reference genome (hg19) and the EV genome sequence. The resulting 1

mapped read pairs are spaced by the average size of the library insert (3.5 Kb) and point outwards. Supplemental Figure 3. Common types of structural variants. Structural variants can be represented by one or more breakpoints. (a) somatic deletion is represented by a single breakpoint joining the donor site () and acceptor site () flanking the deleted region (green bar). The discordant tumor sample read pairs supporting the deletion are represented by arrows joined by dashed lines. The arrow colors represent reference regions to which the reads map. In case of deletion, the observed insert size is significantly larger compared to the expected insert size. (b) Small insertion is an insertion that is smaller than the typical library insert size, and is represented by a single breakpoint. (c) Tandem duplication. (d) Representations of inversion, and the two associated breakpoints. (e) Unbalanced (duplicative) translocation, (f) balanced translocation, and (g) duplication. Supplemental Figure 4. Structural variant comparison within different parts of the NPC5989 tumor tissue. Two parts of the NPC5989 tissue were analyzed using primers against breakpoints detected by SMSH. STT represents DN from NPC5989 frozen tissue used for whole- genome sequencing and variant discovery. STT represents a different, archival, part of the same tumor tissue and likely represents an earlier stage of the tumor because of a number of structural variants that are not detectable compared to the frozen tissue. However, the coupled inversion that produced the YP1- MM2 fusion is detectable in both samples, suggesting that this event is more likely to be a driver mutation compared to other structural variants that emerge later. 25 Kb Chr2 deletion is also present in both fresh and FFPE NPC5989 samples, suggesting that it also occurred early in the tumor development. This variant deletes uncategorized gene K131224 and partially deletes uncategorized gene FJ16124. Supplemental Figure 5. Summary of tumor content assessment. For each of the 10 breakpoints (SV1-10), and corresponding structural variants, SMSH reported a certain number of supporting mate- pair reads. ased on these numbers, and the expected physical coverage, we assessed the tumor content (TC from MPs column). The tumor content for the coupled inversions was obtained by averaging across the three underlying breakpoints. The TC (from gel) column represents assessed tumor content based on breakpoint- PCR experiment and measuring the band intensity on the agarose gel (Fig. S6, Supp. Table 5). The TC (from qpcr) column represents assessed tumor content based on breakpoint- qpcr experiment (Supp. Table 6). The TC (from CNV) column represents assessment of tumor content based on increase of dosage of telomeric end of Chr1 and decrease of telomeric end of Chr8 dosage resulting from the duplicative translocation (Fig. 2). Supplemental Figure 6. reakpoint- PCR experiment. To assess tumor content using PCR, we designed four primers for each breakpoint (lanes SV1-10). (a) Two 2

primers were on both sides of the breakpoint, and are expected to produce a breakpoint PCR product. Each of the two breakpoint primers were also paired with the control primer, which is expected to produce a control PCR product from the unrearranged allele. (b) The four primers were used together in a competitive PCR reaction, for which we used both the tumor sample (lane T ) and normal sample (lane N ), which we used to distinguish the band. ane W represents PCR product where we used water instead of sample DN. ll breakpoints SV1-10 were amplified this way, PCR products run on the gel, and control bands identified, and their fluorescent intensities were measured using densitometer. The relative amounts of and control bands were detected based on the measured band intensities (Supp. Table T5) 3