Nature Genetics: doi: /ng Supplementary Figure 1

Similar documents
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Nature Biotechnology: doi: /nbt Supplementary Figure 1

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Methods TCGA samples. Library construction.

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

Package MADGiC. July 24, 2014

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material

Analysis of Microarray Data

Nature Methods: doi: /nmeth Supplementary Figure 1. Validation of RaPID with EDEN15

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

The human noncoding genome defined by genetic diversity

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Pennsylvania 15260, USA and 6 Department of Bioinformatics and Biosystems Technology, Technical University of Applied Sciences Wildau, Hochschulring

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07.

Supplementary Fig. 1 related to Fig. 1 Clinical relevance of lncrna candidate

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

IPA Advanced Training Course

Detection of the TMPRSS2:ERG fusion transcript

Runs of Homozygosity Analysis Tutorial

Supplementary Figure 1: sgrna library generation and the length of sgrnas for the functional screen. (a) A diagram of the retroviral vector for sgrna

Systematic comparison of CRISPR/Cas9 and RNAi screens for essential genes

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Applications of the Ion AmpliSeq Immune Repertoire Assay Plus TCRβ

Post-assembly Data Analysis

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

RNA Sequencing Analyses & Mapping Uncertainty

Accessible answers. Targeted sequencing: accelerating and amplifying answers for oncology research

% Viability. isw2 ino isw2 ino isw2 ino isw2 ino mM HU 4-NQO CPT

Terminology: chromosome; gene; allele; proteins; enzymes

BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

Welcome to the NGS webinar series

Quality Control Assessment in Genotyping Console

SUPPLEMENTAL MATERIALS

SUPPLEMENTAL FIGURE LEGENDS. Figure S1: Homology alignment of DDR2 amino acid sequence. Shown are

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Lees J.A., Vehkala M. et al., 2016 In Review

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.

PIP-seq. Cells. Permanganate ChIP-Seq

Oncomine cfdna Assays Part III: Variant Analysis

iregnet3d: three-dimensional integrated regulatory network for the genomic analysis of coding and non-coding disease mutations

Improving coverage and consistency of MS-based proteomics. Limsoon Wong (Joint work with Wilson Wen Bin Goh)

Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

Digital DNA/RNA sequencing enables highly accurate and sensitive biomarker detection and quantification

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior- Enhanced Read Mapping

Mapping strategies for sequence reads

Fusion Gene Analysis. ncounter Elements. Molecules That Count WHITE PAPER. v1.0 OCTOBER 2014 NanoString Technologies, Inc.

Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!

Convoy TM Transfection Reagent

Activation of a Floral Homeotic Gene in Arabidopsis

Regulation of eukaryotic transcription:

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.

Gene expression connectivity mapping and its application to Cat-App

Quality Measures for CytoChip Microarrays

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays

Supplementary Figures Montero et al._supplementary Figure 1

RNA-Sequencing analysis

RNA-Seq analysis using R: Differential expression and transcriptome assembly

Supplementary Figures

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Cancer Genetics Solutions

Chapter 24: Promoters and Enhancers

Poly A + RNA polya 3' XXXXX. SMART CDS primer First-strand synthesis & tailing by SMARTScribe RT. SMARTer II A Oligonucleotide. Single step 5' XXXXX

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

Optimizing FFPE DNA preparation for use in SureSelect XT2

Multiple Testing in RNA-Seq experiments

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

Gene Signal Estimates from Exon Arrays

ACCEL-NGS 2S DNA LIBRARY KITS

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Differential protein occupancy profiling of the mrna transcriptome

Exploring genomic databases: Practical session "

Sort-seq under the hood: implications of design choices on largescale characterization of sequence-function relations

A Common Variant at the 14q32 Endometrial Cancer Risk

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group

The ENCODE Encyclopedia. & Variant Annotation Using RegulomeDB and HaploReg

Introduction to genome biology

QuantStudio 3D Digital PCR System

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

microrna-122 stimulates translation of Hepatitis C Virus RNA

Supplemental Table 1 Gene Symbol FDR corrected p-value PLOD1 CSRP2 PFKP ADFP ADM C10orf10 GPI LOX PLEKHA2 WIPF1

Neutral theory: The neutral theory does not say that all evolution is neutral and everything is only due to to genetic drift.

Oral Cleft Targeted Sequencing Project

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population.

Full-length single-cell RNA-seq applied to a viral human. cancer: Applications to HPV expression and splicing analysis. Supplementary Information

SUPPLEMENTARY INFORMATION

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Package FSTpackage. June 27, 2017

You use the UCSC Genome Browser ( to assess the exonintron structure of each gene. You use four tracks to show each gene:

Understanding protein lists from comparative proteomics studies

Course Presentation. Ignacio Medina Presentation

BTRY 7210: Topics in Quantitative Genomics and Genetics

Solutions to Quiz II

Transcription:

Supplementary Figure 1 Processing of mutations and generation of simulated controls. On the left, a diagram illustrates the manner in which covariate-matched simulated mutations were obtained, filtered to remove potential false positives from mapping errors and split into experimental and validation subsets. The panels on the upper right shows the fraction of mutations in each RegulomeDB category that were filtered out owing to a high mismap score. Also depicted is a Venn diagram showing the number of mutations filtered out as potential false positives from mapping errors as well as the overlap of these mutations with difficult-to-align regions of the genome. These mutations are enriched in category 3b as well as in regions with no regulatory annotations (6 and 7). The panel on the middle right shows the breakdown of transcript annotations for real and simulated mutations in each RegulomeDB category. The panel on the bottom right shows the distributions of replication timing and base-pair composition for simulated and real mutations for each cancer type. The panel on the bottom left shows the similarity in the distributions of the number of mutations per sample for the experimental and validation subsets in each cancer type.

Supplementary Figure2 Mutation calling quality metrics. (a) The distribution of the variant allele fraction for each cancer type is shown via violin plots. (b) A scatter plot showing the relationship between genome sequencing file size and number of mutations called for that sample. (c) Box plots and overlaid points depict the median coverage of each sample grouped by cancer type. BRCA, breast invasive carcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; OV, ovarian serous cystadenocarcinoma; UCEC, uterine corpus endometrial carcinoma.

Supplementary Figure 3 Similarity of sets of transcription factor bound mutations. For each pair of transcription factors shown in Figure 2g, the Jaccard similarity was computed on the basis of the overlap in the genomic positions mutated with RegulomeDB transcription factor annotations for the two factors. Factors were clustered on the basis of this similarity score, and the scores are plotted here as a heat map. The average enrichment score of real versus simulated mutations for all cancer types for each transcription factor is shown below the transcription factor labels on the x axis.

Supplementary Figure 4 Mutational patterns in transcription factor binding sites. (a) An analysis was performed to identify all transcription factor motifs with an increased match score in mutant sites compared to reference sites. Only mutations in sites for the CEBP factors were used for this analysis. (b) The sequences surrounding the mutations were aligned using TTG(T/C) as the seed. This seed motif, the aligned reference and the aligned mutant sequences are shown as well as a histogram of the number and type of mutations at each position. (c) The most common sequences of eight bases in length contributing to the motif in b are shown. (d) The counts of mutations from these sites by patient are shown. One patient with UCEC has a disproportionate number of these mutant sites. (e) Box plots of RNA-seq expression values for samples with and without CEBP mutations are show for the factors matching CEBP motifs or motifs with a higher match score in a. (f) Seed, reference and variant alignments as well as mutation counts by position are shown for the factors from Figure 3f.

Supplementary Figure 5 Mutation probability fitting and model validation test. (a) Logistic regression allows for the calculation of the probability of mutation conditioned on replication timing, base-pair composition, transcript type and patient ID. Box plots of predicted probabilities across all patients are shown for the various combinations of transcript region, base-pair type and replication timing bin. (b) The fraction of sites identified in the validation set that can be found in the experimental set and vice versa are plotted, showing the robustness of the method even with a small number of patient samples. (c) A box plot depicting the difference in log 10 RNA-seq expression data for PLCXD1 in samples either with or without a mutation at chr. X: 197,480. P value was determined by the bootstrap method, as the data were not normally distributed.

Supplementary Figure 6 Screening for functional mutated regulatory elements. Wild-type and mutant versions of four control regions and ten repeatedly mutated regulatory regions, including one of the TERT promoter mutations, were assayed for their ability to enhance the transcriptional activity of a minimal promoter using a luciferase assay. Constructs were assayed in NCI-H1437 lung adenocarcinoma cells.