B&DA Committee Bioinformatics and Data Analysis. PAG January 2016

Similar documents
Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

RNA-Sequencing analysis

Long and short/small RNA-seq data analysis

Bioinformatics of Transcriptional Regulation

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Data and Metadata Models Recommendations Version 1.2 Developed by the IHEC Metadata Standards Workgroup

RNA-Seq Software, Tools, and Workflows

CHAPTER 21 LECTURE SLIDES

Introduction to genome biology

Introduction to RNA-Seq

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

Bioinformatics Advice on Experimental Design

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Course Presentation. Ignacio Medina Presentation

Introductory Next Gen Workshop

Single Cell Genomics

Gene Expression: Transcription

RNAseq Differential Gene Expression Analysis Report

Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data.

De novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Mapping strategies for sequence reads

2 Gene Technologies in Our Lives

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

Genomics and Transcriptomics of Spirodela polyrhiza

DNA Structure and Analysis. Chapter 4: Background

Genomic Data Analysis Services Available for PL-Grid Users

Year III Pharm.D Dr. V. Chitra

Differential Gene Expression

Microarray Gene Expression Analysis at CNIO

Epigenetic CHaracterization and Observation (ECHO) Proposers Day

Wednesday, November 22, 17. Exons and Introns

measuring gene expression December 5, 2017

How much sequencing do I need? Emily Crisovan Genomics Core

Introduction to the UCSC genome browser

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Section DNA: The Molecule of Heredity

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

SOLiD Total RNA-Seq Kit SOLiD RNA Barcoding Kit

Welcome to the NGS webinar series

Human Fibroblast IMR90 Hi-C Data (Dixon et al.)

Automated size selection of NEBNext Small RNA libraries with the Sage Pippin Prep

SCALABLE, REPRODUCIBLE RNA-Seq

WORKSHOP. Transcriptional circuitry and the regulatory conformation of the genome. Ofir Hakim Faculty of Life Sciences

DNA makes RNA makes Proteins. The Central Dogma

Shuji Shigenobu. April 3, 2013 Illumina Webinar Series

Differential Gene Expression

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group

Index. E Electrophoretic Mobility Shift Assay (EMSA), 262 ENCODE project, 223, 224 European Nucleotide Archive (ENA), 34

Next Generation Sequencing: An Overview

Multi-omics in biology: integration of omics techniques

A step-by-step guide to ChIP-seq data analysis

Chapter 11. Gene Expression and Regulation. Lectures by Gregory Ahearn. University of North Florida. Copyright 2009 Pearson Education, Inc..

Computational Biology I LSM5191 (2003/4)

Sanger vs Next-Gen Sequencing

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

Analysing genomes and transcriptomes using Illumina sequencing

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Product selection guide Ion S5 and Ion S5 XL Systems

M Keramatipour 2. M Keramatipour 1. M Keramatipour 4. M Keramatipour 3. M Keramatipour 5. M Keramatipour

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.

Fig Ch 17: From Gene to Protein

Transcription in Eukaryotes

Genomics and Gene Recognition Genes and Blue Genes

Introduction to RNA sequencing

Identifying and mitigating bias in next-generation sequencing methods for chromatin biology

Next-Generation Sequencing. Technologies

produces an RNA copy of the coding region of a gene

DNA Transcription. Dr Aliwaini

RNA-seq Data Analysis

Introduction Bioo Scientific

Chapter 15 Gene Technologies and Human Applications

Quality assessment and control of sequence data. Naiara Rodríguez-Ezpeleta

RNA-Seq analysis workshop. Zhangjun Fei

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Top 5 Lessons Learned From MAQC III/SEQC

NOW GENERATION SEQUENCING. Monday, December 5, 11

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

The ENCODE Encyclopedia. & Variant Annotation Using RegulomeDB and HaploReg

GENETICS 1 Classification, Heredity, DNA & RNA. Classification, Objectives At the end of this sub section you should be able to: Heredity, DNA and RNA

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Read Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Microarrays: since we use probes we obviously must know the sequences we are looking at!

Next Gen Sequencing. Expansion of sequencing technology. Contents

PrimePCR Assay Validation Report

DNA Transcription. Visualizing Transcription. The Transcription Process

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Product selection guide Ion GeneStudio S5 Series

Research school methods seminar Genomics and Transcriptomics

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Division Ave. High School AP Biology

Analysis of Differential Gene Expression in Cattle Using mrna-seq

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

ACCEL-NGS 2S DNA LIBRARY KITS

Transcription:

B&DA Committee Bioinformatics and Data Analysis PAG January 2016

B&DA progress so far Aim: to define standard bfx pipelines for FAANG data Group has skyped many times We have a Wiki: hosted by EBI Sub-groups RNA: RNA-Seq, mirna-seq, full length etc ChIP: ChIP pull down of things bound to DNA Methylation: pipelines to identify methylated regions 3D structure: HiC and other methods Barriers: lack of data on which pipeline is best; seeking to compare pipelines; seeking suitable datasets

Analysis issues Do we want fuzzy or hard pipelines? If we don t use the same pipelines, when we compare data, we will just find pipeline differences However, we don t want to take years finding the perfect pipeline (there isn t one anyway!) Is a single pipeline enforceable? Can we afford for it not to be?

Other committees Are we communicating with the other FAANG committees in the right way? Steering committee Animals, Samples and Assays (ASA) Bioinformatics and Data Analysis (B&DA) Communication (COM) Metadata and Data Sharing (M&DS)

Stimulating collaborations How do we stimulate integration and development of future collaborative projects? How to we facilitate future activities? What do we need? Funding? Meetings? Access to compute infrastructure? Access to data?

Impact What are translational impacts of FAANG work on animal production and human health by comparative genomics? If we do this project correctly, what will be possible in the future because of it? What is the impact of what we are doing?

REPORTS FROM 4 SUB- COMMITTEES

CHROMATIN STRUCTURE

B&DA Committee Bioinformatics and Data Analysis Chromatin Structure PAG January 2016 Motivation Gene expression can be regulated by modification of chromatin compaction and 3D distance between loci => interest in profiling chromatin structure & genome topology Molecular assays DNAse-seq, ATAC-seq, Hi-C Activities of the subcommittee List available tools and datasets, set up and test pipelines, define QC metrics and standard procedures

DNAse-seq Targets DNase I hypersensitive sites: open chromatin, regulatory elements, transcribed regions

DNAse-seq Analysis pipeline - ongoing work read mapping & trimming (bwa) remove duplicates and low-qual (picard, samtools) peak calling (MACS2, HotSpot) Reference datasets livestock: none identified model organisms: several available from ENCODE, Blueprint, Fantom... => build/test pipeline on model organisms data first

ATAC-seq Buenrostro et al 2013 Transposase-mediated insertion of sequencing adapters in open chromatin simple protocol (no ligation) low requirements of initial material similar to DNAse-seq

ATAC-seq Analysis pipeline - ongoing work read trimming (trim_galore/cutadapt) read mapping (bowtie2) remove duplicates and low-qual (picard, samtools) peak calling (MACS2) Reference datasets model organisms: human GM12877 cells (from Buenrostro et al 2013) livestock: none publicly available but several in production in pig, cattle, chicken and goat (FAANG French pilot project FR- AgENCODE) => preliminary results: poster P0420

Hi-C Lieberman Aiden et al 2009, Rao et al 2014 genome-wide detection of proximal pairs of loci in 3D nuclear space generates contact matrix detects inter- and intra-chromosomal interactions

Hi-C Analysis pipeline - ongoing work read mapping and trimming (bowtie2, cutadapt) remove duplicates, low-qual and inconsistent mapping (samtools) generate and normalize contact matrix (ICE) identify Topologically Associated Domains workflow implementation: HiC-Pro, HiTC, HiFive Reference datasets model organisms: human IMR90 (Dixon et al 2012), mouse CH12 (Rao et al 2014) livestock: none publicly available but several in production in pig, chicken, cattle and goat (FAANG French pilot project FR- AgENCODE) => preliminary results: poster P0420

METHYLATION

B&DA Committee Bioinformatics and Data Analysis Methylation PAG January 2016 Presenter: Kyle Schachtschneider and Ole Madsen

Methylation 35 members Methylation group (January 2016) Aiming at one group meeting every month Decisions on pilot phase made Data Tissue(s) Species Pipelines to test Needed? Data to analyse

Methylation PILOT FASE DATA Whole genome bisulfite sequencing Reduced representation bisulfite sequencing (RRBS) Aiming at 30X for both types of data Tissue(s) Liver from (healthy) adults (maybe also from embryo) Minimum of three biological replicates Species Chicken (bird) and a mammal (Cow, pig, sheep?) Pipelines BLUEPRINT pipeline (provided by Simon Heath) Toulouse pipeline (http://ng6.toulouse.inra.fr/, NG6) Other(s)...?

Methylation PILOT FASE info available on Confluence Optimal fragment size/species RRBS (Toulouse) Genomes used: capra_hircus: CHIR_1.0 gallus_gallus: Galgal4.80 mus_musculus: GRCm38.p4.81 sus_scrofa: Sscrofa10.2.80 bos_taurus: UMD3.1.80 Enzyme used: Msp1 Taq1 Both (double digestion) Software tools for Methylation analysis

CHIP-SEQ

ChIP-seq Data Analysis Standards To agree and define standard pipelines for the analysis of FAANG ChIP-seq data January 11 th FAANG Workshop at PAGXXIV

Data Acquisition Standards Sequencing Single end 50 bp Read coverage: Narrow peaks: 20 million uniquely mapped reads Broad peaks: 40 million uniquely mapped reads Input from same sonication batch should be sequenced at similar depth of ChIP sample Biological replicates (at least 2 should be sex matched)

ChIP-seq analysis pipelines Filter poor quality reads Align reads to genome Filter artifacts and multiple mapped reads Call peaks Determine consistency across replicates ENCODE ChIP-Seq pipeline: https://github.com/encode- DCC/chip-seq-pipeline UC Davis ChIP-Seq pipeline: https://github.com/kernco/chipseq-pipeline Your pipeline: https://github.com/please SUBMIT ASAP

UC Davis pipeline: https://github.com/kernco/chipseq-pipeline Make file: $ make -f chipseq.makefile MARK=H3K4me3 INPUT=input ASSEMBLY=assembly.fa Requires 2 data files: ChIPed-sample.fastq and Inputsample.fastq; and a genome assembly.fa Trim raw reads (Illumina adapters and quality trimming- Trimmomatic) Align with BWA Mark duplicate alignments (picardtools) Peak calling using MACS2 (narrow and broad) IDR - Irreproducible Discovery Rate

ChIP-seq Analysis group meeting Tuesday 2PM Devonshire Room Define a strategy to test ChIP-seq pipelines uploaded to the wiki

RNA

FAANG-analysis RNA - Aims Define bioinformatics pipelines for: Transcript discovery for animal genomes location strand isoforms function Quantify gene expression

FAANG-analysis RNA - Data Short read RNA-seq CAGE PolyA-seq Long read (PacBio)

FAANG-analysis RNA - Function protein-coding pseudogene mirna trna lncrna snorna rrna...

Why do we want to do this? Transcript biotypes in Ensembl v83

Want to get from here...

... to somewhere here

What we need Data to benchmark Pipelines for transcript reconstruction Reference guided (example in FAANG wiki) De-novo (example in FAANG wiki) Mapping PacBio long read data Pipelines for quantification Performance comparison of software tools (e.g. Roberts & Watson, 2015) for RNA-seq CAGE Pipelines for functional classification

Taking things forward... Informal meeting tomorrow Venue: Dover Room Time: 9:30-11:30

Thank you! Join us: http://www.faang.org/groups?name= analysis