Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Similar documents
Introduction to genome biology

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

RNA-Sequencing analysis

Microarray Gene Expression Analysis at CNIO

Next-Generation Sequencing. Technologies

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Data and Metadata Models Recommendations Version 1.2 Developed by the IHEC Metadata Standards Workgroup

Long and short/small RNA-seq data analysis

Next Gen Sequencing. Expansion of sequencing technology. Contents

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Sanger vs Next-Gen Sequencing

Introduction to the UCSC genome browser

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics

RNA-Seq with the Tuxedo Suite

What we ll do today. Types of stem cells. Do engineered ips and ES cells have. What genes are special in stem cells?

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Do engineered ips and ES cells have similar molecular signatures?

RNAseq Differential Gene Expression Analysis Report

A step-by-step guide to ChIP-seq data analysis

Welcome to the NGS webinar series

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Multi-omics in biology: integration of omics techniques

RNA Seq: Methods and Applica6ons. Prat Thiru

2. Outline the levels of DNA packing in the eukaryotic nucleus below next to the diagram provided.

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior- Enhanced Read Mapping

RNA-seq Data Analysis

Bioinformatics of Transcriptional Regulation

Relationship of Gene s Types and Introns

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

Course Presentation. Ignacio Medina Presentation

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Automated size selection of NEBNext Small RNA libraries with the Sage Pippin Prep

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

CHAPTER 21 LECTURE SLIDES

REGULATION OF PROTEIN SYNTHESIS. II. Eukaryotes

Atelier Chip-Seq. Stéphanie Le Gras, IGBMC Strasbourg Violaine Saint-André, Institut Curie Paris Morgane Thomas-Chollier, ENS Paris

Biologists on the cloud

RNA-Seq Software, Tools, and Workflows

AP Biology Gene Expression/Biotechnology REVIEW

ChIP-seq analysis. adapted from J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Introductory Next Gen Workshop

measuring gene expression December 5, 2017

DNA Transcription. Dr Aliwaini

Index. E Electrophoretic Mobility Shift Assay (EMSA), 262 ENCODE project, 223, 224 European Nucleotide Archive (ENA), 34

IPA Advanced Training Course

TRANSCRIPTION AND PROCESSING OF RNA

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Analysis of Microarray Data

Introduction to Bioinformatics and Gene Expression Technologies

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6

Reference genomes and common file formats

Bio 101 Sample questions: Chapter 10

MOLECULAR BIOLOGY OF EUKARYOTES 2016 SYLLABUS

RNA-Seq analysis using R: Differential expression and transcriptome assembly

You use the UCSC Genome Browser ( to assess the exonintron structure of each gene. You use four tracks to show each gene:

SNP calling and VCF format

Demo of mrna NGS Concluding Report

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Quality assessment and control of sequence data. Naiara Rodríguez-Ezpeleta

Supplementary Fig. 1 related to Fig. 1 Clinical relevance of lncrna candidate

Chapter 15 Gene Technologies and Human Applications

2 Gene Technologies in Our Lives

Differential gene expression analysis using RNA-seq

Introduction to RNA sequencing

De Novo Assembly of High-throughput Short Read Sequences

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF)

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Chapter 18: Regulation of Gene Expression. 1. Gene Regulation in Bacteria 2. Gene Regulation in Eukaryotes 3. Gene Regulation & Cancer

Gene Expression Technology

Lecture 9 Controlling gene expression

Post- sequencing quality evalua2on. or what to do when you get your reads from the sequencer

INTRODUCTION TO REVERSE TRANSCRIPTION PCR (RT-PCR) ABCF 2016 BecA-ILRI Hub, Nairobi 21 st September 2016 Roger Pelle Principal Scientist

RNA spike-in controls & analysis methods for trustworthy genome-scale measurements

Genome Sequence Assembly

Analysing genomes and transcriptomes using Illumina sequencing

Introduction to RNA-Seq

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Name Class Date. Practice Test

Make the protein through the genetic dogma process.

Introduction to Bioinformatics

PIP-seq. Cells. Permanganate ChIP-Seq

DNA Microarray Technology

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

EXECUTIVE SUMMARY GLOBAL GENOMICS AND BIOINFORMATICS RESEARCH INSTITUTE. May 24, 2017

ACCEL-NGS 2S DNA LIBRARY KITS

Impact of Retinoic acid induced-1 (Rai1) on Regulators of Metabolism and Adipogenesis

RNA Structure and the Versatility of RNA. Mitesh Shrestha

Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data.

Chapter 11: Regulation of Gene Expression

Intronic RNAs constitute the major fraction of the non-coding RNA in mammalian cells

Analysis of Biological Sequences SPH

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms


RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

scgem Workflow Experimental Design Single cell DNA methylation primer design

Gene Expression and Heritable Phenotype. CBS520 Eric Nabity

Molecular Genetics Student Objectives

Recombinant DNA: Basics and Advanced Applications

Transcription:

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away Venkat Malladi Computational Biologist Computational Core Cecil H. and Ida Green Center for Reproductive Biology Science

Introduc<on to the Green Center Basic research in female reproductive biology, with a focus on signaling, gene regulation, and genome function. pregnancy parturition stem cells oncology inflammation Key areas: Chromatin structure and gene regulation Epigenetics Nuclear endpoints of cellular signaling pathways Genome organization and evolution DNA replication and repair

Who is in the Green Center? Associated with the Department of Obstetrics and Gynecology Consists of 9 main faculty/labs 20 associated faculty/labs Computational Core W. Lee Kraus, Ph.D., Director of the Green Center.

Role of the Computa<onal Core Consists of 4 Computational Biologists Analysis of Genomic Sequencing Data Responsibilities Data Quality assurance Perform basic analyses Work with investigator to perform integrative analyses Green Center Computation Team Anusha Nagari Tulip Nandu Venkat Malladi Aishwarya Gogate

Challenge: Variety of Assays Supported? ATAC-seq RNA-seq GRO-seq Modified from PLoS Biol 9- e1001046,2011 (M. Pazin)

What is ATAC- seq? Assay for transposase-accessible chromatin using Sequencing (ATAC-Seq): Genomic method that captures open chromatin sites. Buenrostro et al. ( 2013) Nature Methods

What is RNA- Seq? RNA Sequencing (RNA-Seq) : RNA-seq measures RNA abundance of mature RNA species in the cell. These experiments contribute to the understanding of how RNA-based mechanisms impact gene regulation. Types: Total RNA polya mrna (Long and short) shrna small RNA microrna polya depleted RNA

What is GRO- Seq? Global Run On Sequencing (GRO-Seq) : This is a genomic method that maps the position and orientation of all actively transcribing RNA polymerases. Transcription from all three RNA Polymerases is captured providing transcriptional profiles including: protein coding mrna long non-coding RNAs (lncrnas) enhancer RNAs (ernas) divergent transcription antisense transcription intergenic transcription in both annotated and unannotated regions of the genome. ERα Enhancer Annotated Annotated Intergenic Divergent Antisense Other Genic Hah et al. ( 2011) Cell

What is ChIP- Seq? Chromatin immunoprecipitation followed by Sequencing (ChIP-Seq): Identify the binding sites of chromatin-associated proteins. Categories: Transcription factor ChIP-Seq: proteins that associate with specific DNA sequences to influence the rate of transcription Histone ChIP-Seq: measure histone content of chromatin, specifically to the incorporation of particular posttranslational histone modifications in chromatin Park ( 2009) Nature Reviews

Considera<on of making a Pipeline 1. Who are the users 2. Define what the pipeline should deliver 3. Identify all input and output files 4. What QA/QC metrics should be available for users 5. Identify all software used in pipeline 6. Breakdown pipeline into discrete steps (based on deliverable files and metrics)

Users and Goals Users: Wet lab scientists (Grad Students/Post Docs) Computational Biologists in the Green Center Goals: Allow wet lab scientists to quickly assess the quality and explore their data Allow for easily reproducible analysis within the Green Center

Schema: ChIP- seq Pipeline QA Metrics QA Metrics FASTQ (SE/PE) Map bowtie2 BAM Remove Duplicates picard BAM Crosscorrelation Quality fastqc tagalign Fragment size QA Metrics bigwig Call Peaks macs2 narrow Peak

FASTQ: Quality Metrics FastQC Repor Summary Basic Statistics Per base sequence quality Per sequence quality scores Per base sequence content Per base GC content Basic Statistics Measure Value Filename HF_K9_GATCAG_L005_R1_001.fastq.gz File type Conventional base calls Encoding Sanger / Illumina 1.9 Total Sequences 22571166 Filtered Sequences 0 Sequence length 50 %GC 42 Per Base Sequence Quality Per sequence GC content Per base N content Sequence Length Distribution Good quality calls Reasonable quality calls Sequence Duplication Levels Overrepresented sequences Poor quality calls Kmer Content

Alignment: Quality Metrics FASTQ File: DNA sequence Aligned File: DNA sequence + Genomic localization Alignment % = No. of aligned reads Total no. of raw reads * 100

Uniquely Mapped Reads: Quality Metrics Depth Number of uniquely mapping reads Library Complexity Non-Redundant Fraction (NRF) - Number of distinct uniquely mapping reads (i.e. after removing duplicates) / Total number of reads. PCR Bottlenecking Coefficient 1 (PBC1) PBC1=M1/M_DISTINCT where M1: number of genomic locations where exactly one read maps uniquely M_DISTINCT: number of distinct genomic locations to which some read maps uniquely PCR Bottlenecking Coefficient 2 (PBC2) PBC2= M1/M2 where M1: number of genomic locations where only one read maps uniquely M2: number of genomic locations where two reads map uniquely ENCODE Standards hpps://www.encodeproject.org/data- standards/chip- seq/

Uniquely Mapped Reads: Quality Metrics (cont.) NRF Guidelines PBC1 Guidelines PBC2 Guidelines ENCODE Standards hpps://www.encodeproject.org/data- standards/chip- seq/

Alignment: Quality Metrics Report Sample Information Raw reads Alignment % Control Replicate 1 28,259,069 96.30% Control Replicate 2 28,892,302 96.00% Sample 2 Replicate 1 23,239,486 96.10% Sample 2 Replicate 2 25,637,094 96.90% Sample 3 Replicate 1 22,713,054 96.60% Sample 3 Replicate 2 20,419,272 95.90% Sample 4 Replicate 1 22,617,154 96.60% Sample 4 Replicate 2 20,068,460 96.00%

Cross- correla<on: Quality Metrics Report Sample 1 Sample 2 R=0.99 R=0.99 R: Pearson correlation coefficient

Call Peaks: Quality Metrics Report 1. Peak calls for individual replicates 2. Overlapping peaks between the pooled pseudo replicates 3. Bigwig files (UCSC Genome Browser, IGV )

Call Peaks: Quality Metrics Report Visualizing signal tracks (Bigwig files) in UCSC Genome Browser: Franco et al (2015)

Working With BioHPC and Astrocyte

Crea<ng a Project Create New Project to run analysis

Adding Data Select Add Data to this Project...

ChIP- Seq Workflow ChIP-Input fastq files Sequence format ChIP TF or Histone fastq files Assembly

Run Time of ChIP- Seq Pipeline

Thank you! Questions?