T G T A. artificial chimera

Similar documents
Whole Genome Amplification (WGA): What to Do When You Don t Have Enough Genomic DNA

Nature Methods: doi: /nmeth Supplementary Figure 1. Schematic representation of enrichment of amplification errors through allelic bias.

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Haplotype phasing in single-cell DNA-sequencing data

Supplementary Figures

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle

Single-cell sequencing

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Digital RNA allelotyping reveals tissue-specific and allelespecific gene expression in human

Simultaneous genome and transcriptome sequencing

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Genome-Wide Detection of Single-Nucleotide and Copy-Number Variations of a Single Human Cell

Bio5488 Practice Midterm (2018) 1. Next-gen sequencing

Cancer Genetics Solutions

Package falcon. April 21, 2016

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

Package falconx. February 24, 2017

Detection of Rare Variants in Degraded FFPE Samples Using the HaloPlex Target Enrichment System

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the

Figure 1. Distinguishable and Indistinguishable Genotypes

NIH Public Access Author Manuscript Science. Author manuscript; available in PMC 2013 June 21.

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

QIAGEN Whole Genome Amplification REPLI-g Eliminating Sample Limitations, Potential Use for Reference Material

SUPPLEMENTARY INFORMATION

Supplementary Information for:

SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System

Supplementary Figure 1

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

Add 2016 GBS Poster As Slide One

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

Mutations during meiosis and germ line division lead to genetic variation between individuals

PaSD-qc: quality control for single cell whole-genome sequencing data using power spectral density estimation

Structural variation analysis using NGS sequencing

Supplementary Information

GDMS Templates Documentation GDMS Templates Release 1.0

Microsatellite markers

SureSelect XT HS. Target Enrichment

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Shaare Zedek Medical Center (SZMC) Gaucher Clinic. Peripheral blood samples were collected from each

What is genetic variation?

High-Resolution Oligonucleotide- Based acgh Analysis of Single Cells in Under 24 Hours

Understanding Accuracy in SMRT Sequencing

SNP calling and VCF format

NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING

Parts of a standard FastQC report

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

DNA is contained in the nucleus of every cell in your body. Cell Nucleus

Get to Know Your DNA. Every Single Fragment.

Nature Biotechnology: doi: /nbt.4166

SUPPLEMENT MATERIALS FOR CURTIN,

Nature Methods: doi: /nmeth Supplementary Figure 1. Pilot CrY2H-seq experiments to confirm strain and plasmid functionality.

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

Nature Methods: doi: /nmeth.4396

Mate-pair library data improves genome assembly

Performance of the Newly Developed Non-Invasive Prenatal Multi- Gene Sequencing Screen

Introduction to some aspects of molecular genetics

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C

High-yield, Scalable Library Preparation with the NEBNext Ultra II FS DNA Library Prep Kit

Genomic resources. for non-model systems

Normal-Tumor Comparison using Next-Generation Sequencing Data

SUPPLEMENTARY INFORMATION

Nature Methods Optimal enzymes for amplifying sequencing libraries

Processing Ion AmpliSeq Data using NextGENe Software v2.3.0

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Ad5 genome. Before filtering. After filtering. Suppl. File 4 PAGE mated reads. Ad5 DNA insertion site 5' 3' 293 unmated reads

Full-length single-cell RNA-seq applied to a viral human. cancer: Applications to HPV expression and splicing analysis. Supplementary Information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

% Viability. isw2 ino isw2 ino isw2 ino isw2 ino mM HU 4-NQO CPT

STR DNA Data Analysis. Genetic Analysis

Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding

Supplementary Figures Montero et al._supplementary Figure 1

ACCEL-NGS 2S DNA LIBRARY KITS

Agilent NGS Solutions : Addressing Today s Challenges

Algorithms for Genetics: Introduction, and sources of variation

Next GEM Next Generation Mutagenesis. Guri Johal. Department of Botany and Plant Pathology. Purdue University

Implementation of Ion AmpliSeq in molecular diagnostics

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

A Genomics (R)evolution: Harnessing the Power of Single Cells

Amplicon size (bp) E ± SD

Supplementary Information

Nature Methods: doi: /nmeth Supplementary Figure 1. Ideograms showing scaffold boundaries and segmental duplication locations.

SUPPLEMENTAL MATERIALS

SURESELECTXT LOW INPUT TARGET ENRICHMENT

Supplementary Information

Certificate of Analysis of the Holotype HLA 24/7 Configuration A1 & CE

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

Supplementary Figure 1 An overview of pirna biogenesis during fetal mouse reprogramming. (a) (b)

Authors: Vivek Sharma and Ram Kunwar

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Transcription:

False mutation detection caused by WA artifacts original genome A C A artifacts in amplified DNA C C A A false detection of local variants true ssnv mistaken as error artificial chimera false LOH early MDA error as ssnv Supplementary Figure False mutation detection due to whole-genome amplification artifacts. From left to right: false negative SNV detection due to allelic distortion; false positive translocation detection due to artificial chimera; false positive loss-of-heterozygosity due to chromosomal segments dropping out of amplification; false positive SNV detection due to early amplification errors. here are two major differences between allelic distortion and chromosome segments dropping out of amplification. First, allelic distortion is primarily caused by preferential amplification of one chromosome allele; from the auto-correlation analysis, such preferential amplification extends roughly to the amplicon size ( kb). But false LOH here refers to much longer chromosomal segments (> Mb) that drop out of amplification. Second, it is possible to perform deep targeted sequencing by PCR amplification from the MDA product to uncover both chromosome alleles even under allelic distortion. It is impossible to reveal the chromosome segment that has dropped out of amplification by targeted sequencing of the MDA product; a biological replicate is needed to validate if such dropout is an artifact or true deletion.

a Coverage correlations at different scales Bulk DNA sequencing Single-cell sequencing... read length bp x x x fragment length bp correlation at the fragment level correlation at the amplicon level MDA.x.x.x x x x x bulk. x 9x x b Amplicon-level correlation fitting in RPE libraries by MDA RPE #.x Miseq.x Hiseq from fitting. kb. kb correlation magnitude at kb.. RPE # RPE #.x.x.x.7x. kb. kb. kb. kb..7.. c Coverage predictions for single RPE MDA libraries.... RPE # RPE # RPE # prediction.x MDA.x MDA observation.x MDA.... prediction.x MDA.x MDA observation.7x MDA.... prediction.x MDA.x MDA observation.x MDA normalized depth normalized depth normalized depth Supplementary Figure Analysis of amplification bias and genome coverage in MDA-generated libraries of RPE- cells. (a) Correlation at the scale of sequencing read length and fragment length is present in both bulk and single-cell libraries (- bp), but correlation at the scale of amplicons ( kb) only appears in MDA-derived libraries. (b) Amplicon-level correlation independent of the sequencing depth is demonstrated in disomic Chr. in three MDA-generated libraries of twocell RPE samples. (c) Independent sequencing of three MDA-generated RPE libraries to.x and to.x confirms that the bin-level depth-of-coverage predicted from low-pass sequencing agrees with the observed single-base depth-of-coverage in higher-depth sequencing, reflecting the dominant non-uniformity generated at the amplicon level. All correlation functions are calculated for disomic Chr.

a Depth-of-coverage yield curve for bulk and MDA libraries Bulk RPE library Single RPE MDA library Normalized DoC yield.... x 9x x.....x x x.... depth (x) depth (x) depth / mean depth b Normalized depth-of-coverage yield curve at different bin sizes Bulk RPE library Single RPE MDA library bin-level coverage.... base-level coverage 9x x kb kb kb kb kb bp bp.... base-level coverage.x x depth / mean depth optimal bin size = fragment length (- bp) depth / mean depth optimal bin size = amplicon size (- kbp) c Lorenz distribution curves for bulk and MDA libraries evaluated at different bin sizes Bulk RPE library, 9x Single RPE MDA library,.x Single RPE MDA library,.x cumulative read count fraction..... mb kb kb kb bp bp bp bp.... cumulative read count fraction..... mb kb kb kb bp bp bp bp.... cumulative read count fraction..... mb kb kb kb bp bp bp bp.... cumulative cumulative cumulative

Supplementary Figure Coverage non-uniformity in bulk and MDA-derived libraries of RPE- cells evaluated at different bin sizes. (a) he depth-of-coverage (DoC) curves at different sequencing depths reflect the same underlying density distribution of the coverage at different loci in the sequencing library when the coverage depth is normalized by the mean sequencing depth. (b) he depth-of-coverage yield evaluated at the single-base level is almost identical to the depth-of-coverage evaluated at the level where the dominant variation is generated. For bulk DNA libraries, the dominant variation occurs at the level of sequencing fragments (presumably in the PCR step of library preparation). herefore, the DoC curve evaluated at bin size - bp is indistinguishable from the DoC curve evaluated at the single-base level, but at larger bin sizes, the coverage bias is further attenuated. For MDA-generated libraries, the DoC curve is almost invariant until the bin size reaches the level of single amplicons kb. herefore, the DoC curve at the single-base level can be reliably estimated from the bin-level ( kb) coverage in low-pass sequencing data (.x). (c) Lorenz curves evaluated at different bin sizes also confirm the dominant variation at bp scale for the bulk DNA library and at kb scale for the MDA library. In the bulk DNA library (left panel), the Lorenz curve for the bin-level coverage (black curves) is similar to the Lorenz curve at the single base level (red curve) but deviates towards more uniformity as the bin size exceeds bp. In the MDA library sequenced to.x (middle panel), the Lorenz curve is essentially unchanged until the bin size reaches kb and deviates towards more uniformity as the bin size exceeds kb, consistent with an amplicon size kb. In the same MDA library sequenced to.x (right panel), the majority of the chromosome is uncovered. In the uncovered region, amplification non-uniformity cannot be evaluated from the singlebase level coverage (red curve); however, the bin-level coverage ( kb bins) agrees well with the single-base level coverage observed in. x sequencing (dashed red curve), suggesting that this distribution is intrinsic to the MDA but not dependent on the sequencing depth. All Lorenz curves are evaluated for disomic Chr.

a Correlation analysis of single glioblastoma MDA libraries.x x x Chr (CN=) Chr (CN=) Chr (CN=). kb 7.9 kb. kb 7. kb b Coverage predictions for single glioblastoma MDA libraries. BM # BM # prediction from x MDA Chr. (CN=) Chr. (CN=) Chr. (CN=).... observation in x MDA Chr. (CN=) Chr. (CN=) Chr. (CN=)... normalized depth normalized depth Supplementary Figure Analysis of amplification bias and genome coverage in MDA-generated libraries of frozen glioblastoma nuclei (B) in Francis et al. (). (a) (Left) he same amplicon-level correlation (red dashed curve, l c = 7. kb) is revealed from independent sequencing of a single glioblastoma library (BM #) to.x, x, and x. (Right) Chromosomes with different copy number show the same amplicon size but different magnitude of amplification bias, reflecting differential priming efficiencies (from one copy of monosomic chromosome vs. two copies of disomic chromosomes) during the early stage of amplification. (b) he cumulative coverage curve at higher sequencing depths can be accurately predicted from the bin-level coverage at lower depths for chromosomes with different copy number.

a Auto-correlation analysis of single and bulk DNA libraries in Hou et al. () 7 (read level) (fragment level) YH_single.x (downsample) 7.x YH_bulk 9x YH_single.7x (downsample).x YH_bulk 9x (fragment level) (amplicon level) (read level) (amplicon level) b Prediction of allelic coverages of single-cell DNA libraries in Hou et al. () percentage of SNPs.7.. observation from -x combined ref allele alt allele prediction from.x locus-level MM SM percentage of SNPs.7.. c YH bulk DNA library showed considerable C bias at the fragment level bin size: kb RPE bulk YH bulk A %. YH bulk RPE bulk x x x x 9 Chr. pos (Mb) relative abundance......7. A % Supplementary Figure Analysis of amplification bias and genome/allele coverage in bulk and MDA-derived libraries of diploid lymphoblastoid YH cells. (a) Comparison of coverage correlation in single-cell and bulk libraries confirms the dominant bias occurring at the amplicon level kb. here is considerable fragment-level correlation in both the bulk and one of the two single-cell libraries that is likely introduced during the PCR step of library preparation. (b) he coverage at heterozygous sites (reference, alternate, or both alleles combined) evaluated at high sequencing depths agrees well with predictions from low-pass sequencing. (c) he fragment-level bias in the bulk DNA library shows a strong correlation with the local sequence content (A%) that is absent in the bulk library of RPE- cells shown in Supplementary Figure.

a Auto-correlation analysis of single and bulk MDA libraries in Evrony et al. () (i) MDA of single neurons (ii) MDA of neurons (iii) MDA of, neurons read level correlation amplicon level correlation Sample read level correlation amplicon correlation -neuron- bp. kb -neuron- 9 bp. kb -neuron- bp. kb -neuron- bp. kb -neuron- bp. kb -neuron- bp. kb -bulkdna bp NA 9 9 single neuron (-cortex-mda-) 7 relative abundance relative abundance 7 9 neurons (-cortex--mda-).............. C %....... C % 9 # of bins with given C bulk (-cortex-bulkdna) 7 C %, neurons (-cortex-k-mda-) 7....... C %....... C % 7 Correlation between C content and library representation at the amplicon level (bin size: kb) relative abundance relative abundance b

Supplementary Figure Analysis of coverage correlation in MDA generated libraries of frozen neuron nuclei in Evrony et al. (). (a) MDA-generated libraries of single neurons, hundreds of neurons, or thousands of neurons show the same amplicon-level correlation reflecting amplicon size kb. he short-range correlation reflects the read length ( bp) at low sequencing depths. (b) Comparison of the relative abundance of sequence coverage with local C content at the amplicon level (evaluated at kb bins) suggests that there is a wide range of coverage variation in regions with similar C content. By contrast, there is little variation at kb scale in the bulk library. Error bars show standard deviation for each C stratum. he variation is especially evident for single-neuron libraries, where the range is proportional to the total number of bins with similar C content, reflecting random, instead of sequence-content generated bias.

a Auto-correlation analysis of single sperm DNA libraries in Wang et al. () 7-x Sample correlation length correlation magnitude sperm-. kb 9. sperm-. kb. sperm-. kb.9 sperm-. kb. sperm-. kb.9 sperm- 9. kb. sperm-7. kb 7. sperm-.9 kb. b Coverage predictions c Error rate estimates percentage of SNPs...... % sites with > ref. and alt. reads....... sperm- sperm- sperm- sperm- sperm- sperm- sperm-7 sperm- Supplementary Figure 7 Analysis of coverage variation in MDA generated libraries of single sperms in Wang et al. (). (a) Different single-sperm libraries exhibit dominant bias at - kb range consistent with the MDA amplicon size. (b) Cumulative coverage at higher sequencing depth (7-x) can be accurately predicted from the amplicon-level coverage evaluated from low-pass sequencing (<x). (c) About % sites showed heterozygosity (with > reads of both reference and alternate alleles) in the haploid sperm libraries, indicating an error frequency of % at allele frequency % or above. 9

a Coverage correlations in MALBAC, MDA, and bulk SW libraries (Zong et al. ) ~x ~x SW- (SRS77) SW- (SRS77) SW- (SRS77) SW- (SRS7) SW- (SRS7) MDA (SRS7) 9 kb bulk DNA (SRS7) b Snapshots of sequencing coverage in MDA and MALBAC libraries kb MDA MALBAC c Coverage predictions for single SW- libraries prediction. observation observation x SW-MDA. from x SW- (MALBAC) (bin size: bp) prediction from x SW-MDA (bin size:. kb)... x SW- MALBAC... Supplementary Figure Analysis of coverage variation in MDA and MALBAC generated libraries of SW tumor cells in Zong et al. (). (a) All five MALBAC-generated single SW libraries (blue symbols) show a much shorter amplicon compared to the MDAgenerated library (red squares). (b) Snapshots of local sequencing coverage confirm the shorter amplicons in MALBAC in comparison to MDA. (c) In both MDA and MALBAC-generated libraries, the dominant source of non-uniformity exists at the amplicon-level and can be accurately predicted from low-pass sequencing.

a Coverage correlations in DOP-PCR amplified single-cell libraries (Wang et al. ) read level correlation ~ bp amplicon level correlation ~ bp b Correlation magnitude and spread (amplicon size) in DOP-PCR libraries (Wang et al. ) correlation magnitude correlation length (bp) c Correlation magnitude and spread (amplicon size) in MDA and MALBAC libraries ploidy correlation magnitude sperms single BMs single SW RPEs (two-cell) YH Neuron N N N N N N N correlation length (kb) Supplementary Figure 9 Analysis of coverage variation in DOP-PCR libraries of tumor nuclei in Wang et al. () (a) Different libraries (9 total) exhibit similar fragment-level correlations (l c = bp) and amplicon-level bias (l c = bp). (b) Magnitude and spread of DOP-PCR bias. he amplicon size (estimated by the correlation length) agrees well with the size selection protocol in the original report (Navin et al., ). he magnitudes of correlation in DOP-PCR libraries are generally smaller than those observed in MDA-generated libraries as shown in (c), reflecting less overall bias in DOP-PCR. (c) Magnitude and spread of MDA bias in different studies. More DNA content in N samples results in better uniformity than N or N samples. his is consistent with improved uniformity of MDA from nuclei in Wang et al. ().

Mathematical models of allelic amplifications combined coverage from both alleles locus-level read counts and coverage sequencing coverage ( read): f ( ): f ( ): f ( ): f mixed template model (MM) symmetric binomial distribution at each locus segregated template model (SM) independent distribution at each locus allele read counts allele A allele B alellic coverage allele A ( read): λ allele B ( read): λ Supplementary Figure Schematic illustration of two models for allele bias. Details of the derivations are given in the Methods section.

a Allelic coverage predictions for single RPE MDA libraries observation from.x combined ref allele alt allele prediction from.x locus-level MM SM percentage of SNPs.... RPE # percentage of SNPs.... RPE # percentage of SNPs.... RPE # b Comparison of allelic coverage in disomic and monosomic chromosomes BM # BM # allelic coverage in Chr. predicted from x allelic coverage in Chr. predicted from x fraction of allele.... MM SM locus coverage observations chr. (.x, disomic) chr. (.x, monosomic) fraction of allele.... MM SM locus coverage observations chr. (.9x, disomic) chr. (.x, monosomic) normalized DoC normalized DoC BM # allelic coverage in Chr. predicted from x BM # allelic coverage in Chr. predicted from x. MM SM. MM SM fraction of allele... locus coverage observations chr. (.x, disomic) chr. (.9x, monosomic)... locus coverage observations chr. (9.7x, disomic) chr. (.x, monosomic) normalized DoC normalized DoC Supplementary Figure Allele-level amplification bias in fresh RPE- and frozen glioblastoma libraries generated by MDA. (a) Comparison of the allele coverage (A or B) predicted from the locus-level coverage (A+B) in two-cell RPE samples with the observation at heterozygous sites in Chr.. he observation suggests that amplification of homologous chromosomes more resembles the mixed template model (MM). (b) Comparison of the allele coverage prediction from disomic chromosome with the observed coverage in monosomic chromosome in frozen glioblastoma nuclei. he agreement between the segregated template model (SM) prediction and the observed coverage in the monosomic chromosome further supports that each chromosome in the single-cell genome is independently amplified.

a RPE #: Positive correlation between ref. and alt. allele counts B is alt. B is ref. allele A @x SNPs covered at given depth. x... ref. allele coverage in Chr. alt. allele coverage in Chr. mean depth of allele B % of allele B of Chr. covered x.... allele A @x allele A @x MM prediction from x data allele A depth b BM #: Independence between ref. and alt. allele counts SNPs covered at given depth ref. allele coverage in Chr. alt. allele coverage in Chr. mean depth of allele B Chr. SNPs Chr. SNPs allele A depth % of allele B of Chr. covered x..... allele A @x allele A @x allele A @x SM prediction from x data B is alt. B is ref. Supplementary Figure Distinct correlations between the amplification of homologous chromosomes. (a) he depth-of-coverage curves of reference (A) and alternate (B) alleles are identical (left panel) but exhibit a positive correlation (middle panel). he depth-of-coverage curve of allele B at sites where the A allele is covered at different depths (x, x, x) also shows a positive correlation with the allele A coverage but can be predicted by the method described above (right panel). he correlation between allele A and allele B coverage indicates that over-amplified regions are not dominated by amplification of one homologue but tend to represent both homologous chromosomes. (b) In a representative glioblastoma library, even though the depth-of-coverage curves of reference (A) and alternate (B) alleles are still identical but the correlation between allele A and allele B coverage is absent (middle panel). he depth-of-coverage curves of allele B at sites where the A allele is covered at different depths (x, x, x) are almost identical (right panels). hese results demonstrate independent amplifications of the two homologous chromosomes.