Course Presentation. Ignacio Medina Presentation

Similar documents
Analytics Behind Genomic Testing

NGS in Pathology Webinar

2017 HTS-CSRS COMMUNITY PUBLIC WORKSHOP

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

Introduction of RNA-Seq Analysis

A NGS data analysis course

Session 8. Differential gene expression analysis using RNAseq data

NGS part 2: applications. Tobias Österlund

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Sanger vs Next-Gen Sequencing

Applications of short-read

RNA-Seq with the Tuxedo Suite

G E N OM I C S S E RV I C ES

RNA-Seq Module 2 From QC to differential gene expression.

Top 5 Lessons Learned From MAQC III/SEQC

Long and short/small RNA-seq data analysis

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

RNA-seq Data Analysis

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

CNV and variant detection for human genome resequencing data - for biomedical researchers (II)

RNAseq Differential Gene Expression Analysis Report

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

Eucalyptus gene assembly

NGS Data Analysis and Galaxy

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

Next Generation Sequencing

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Introduction to iplant Collaborative Jinyu Yang Bioinformatics and Mathematical Biosciences Lab

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data

SCALABLE, REPRODUCIBLE RNA-Seq

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

RNA-Seq Analysis. Simon Andrews, Laura v

High performance sequencing and gene expression quantification

Quantifying gene expression

Centro Nacional de Análisis Genómico. Where are the Bottlenecks of Genome Analysis Today? Teratec. Ecole Polytechnique, Palaiseau, F.

Welcome to the NGS webinar series

From Lab Bench to Supercomputer: Advanced Life Sciences Computing. John Fonner, PhD Life Sciences Computing

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Introduction to RNA-Seq in GeneSpring NGS Software

De novo assembly in RNA-seq analysis.

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Pioneering Clinical Omics

Prioritization: from vcf to finding the causative gene

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

How to deal with your RNA-seq data?

Globus Genomics at GSI Boston University. Dinanath Sulakhe, Alex Rodriguez

RNA-Seq Software, Tools, and Workflows

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Arraygen Technologies Pvt. Ltd.

Using VarSeq to Improve Variant Analysis Research

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

RNA-SEQUENCING ANALYSIS

Introduction to BIOINFORMATICS

Galaxy Platform For NGS Data Analyses

Introduction to DNA-Sequencing

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Sequence Analysis 2RNA-Seq

Variant calling in NGS experiments

Whole Genome, Exome, or Custom Targeted Sequencing: How do I choose? Aaron Thorner, PhD Clinical Genomics Group Leader

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting

DATA FORMATS AND QUALITY CONTROL

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Transcriptome analysis

Variant prioritization in NGS studies: Annotation and Filtering "

From raw reads to variants

Genomic Data Analysis Services Available for PL-Grid Users

The experience of Andalusia in ehealth and big data --- Big data in health - IMI s HARMONY project, European Parliament, Brussels, 19 June 2018

Bioinformatics Core Facility IDENTIFYING A DISEASE CAUSING MUTATION

Read Mapping and Variant Calling. Johannes Starlinger

Short Read Alignment to a Reference Genome

COMPUTATIONAL PREDICTION AND CHARACTERIZATION OF A TRANSCRIPTOME USING CASSAVA (MANIHOT ESCULENTA) RNA-SEQ DATA

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016

Functional Genomics Research Stream. Research Meeting: June 5, 2012 Summer Goals

Next Generation Sequencing. Tobias Österlund

Biology 644: Bioinformatics

Introduction to NGS analyses

Herramientas para el diseño y el análisis de datos de paneles de genes

Elixir: European Bioinformatics Research Infrastructure. Rolf Apweiler

BABELOMICS: Microarray Data Analysis

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Introduction to RNA sequencing

res2)=)colocalisation(eqtl_file="icoslg_eqtl.txt",) gwas_file="uc_gwas_icoslg_locus.txt",)p1)=)1e+4,)p2)=)1e+4,)p12)=)1e+5,) region)=)200000)$

IDENTIFYING A DISEASE CAUSING MUTATION

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Genome Sequencing in Cancer Diagnostics (research) Nederlandse Pathologiedagen 19 & 20 November 2015

Optimal Calculation of RNA-Seq Fold-Change Values

Transcription:

Course

Index Introduction Agenda Analysis pipeline Some considerations

Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital (University of Cambridge) : Head of Computational Biology Lab, HPCS, University of Cambridge, UK Team Leader for Bioinformatic Software Development, Genomics England, London, UK Scientific Collaborator at EMBL-EBI Variation team (Cambridge, UK) Everything started at Joaquin Dopazo's group at CIPF: http://bioinfo.cipf.es/ More than 10 years of experience in microarrays & NGS data analysis and developing methodologies and bioinformatics tool for data analysis. Many suites and tools developed: GEPAS, Babelomics, Genome Maps, BierApp, VARIANT,... More than 60 papers in the last 8 years in peer reviewed journals: NAR, Bioinformatics, Nat. Biotech., Many active collaborations with experimental and clinic groups Many international courses run during last years: Massive Data Analysis (MDA)

Introduction Goals, ambitious To learn the basics to understand and be able to conduct a standard NGS data analysis from scratch in a Linux environment To know and understand the different data analysis pipelines and formats (FASTQ, SAM/BAM, VCF) To preprocess and perform QC of raw and processed data To learn and use the most widely used tools to perform NGS data analysis and visualization To learn the basics of the functional interpretation of variant (DNA re-sequencing) To understand the limitation of current technologies and know where are we heading Optional, learn how to install NGS software in Linux and how to tune up data analysis pipelines by simulating data

Program First day 09:00 Welcome and Course presentation 09:30 Introduction to Next Generation Sequencing (NGS) 10:15 Introduction to NGS data analysis & GNU/Linux shell 10:45 Coffee Break 11:00 FastQ Quality Control for NGS Raw Data (theory & hands-on) 12:30 Lunch Break 13:30 Mapping NGS Reads for Genomic (theory & hands-on) 15:00 Coffe Break 15:15 Mapping NGS Reads for Genomic (theory & hands-on) 16:00 Mapping Quality Control and Visualization (theory and hands-on) 17:00 Optional: A more advanced Linux session. NGS software Installation (1h)

Program Second day 09:00 Germline Variant Calling and variant visualization (theory & hands-on) 10:30 Coffee Break 10:45 Somatic Variant Calling and discovery of de novo mutations 12:00 Variant filtering 12:30 Lunch Break 13:30 Variant Annotation (theory & hands-on) 14:15 OpenCGA Variant Storage 15:15 Tea Break 15:30 Rare variant association tests: single variant and burden analysis 16:30 Troubleshooting and help with own data analysis??:?? Finish

Analysis pipeline Exome and Genome Analysis (DNA) Fastq Sequence preprocessing Fastq Mapping BAM Visualization BAM FastQC, Cutadapt DNA: BWA, HPG Aligner,... RNA: TopHat, STAR, HPG Aligner IGV, Genome Maps RNA-seq Analysis (RNA) GATK MuTect Ensembl VEP Annovar CellBase Variant calling VCF Variant annotation RNA-Seq processing Count matrix RNA-Seq data analysis CuffLinks CuffDiff BierApp,... Prioritization Functional analysis Babelomics

Analysis pipeline Aligning reads, the coverage Coverage (read depth or depth) is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads(n), and the average read length(l) as NxL/G. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x redundancy. Useful for: Error sequencing detection Copy number detection Genotyping... Current re-sequencing projects target to 40x depth Mapping NGS reads for genomic studies

Analysis pipeline paired-end vs single-end alignment Paired-end sequencing: Improves read alignment and therefore variant calling Helps to detect structural variation Can detect gene fusions and splice junctions Useful for de novo assembly... Mapping NGS reads for genomic studies

Some considerations NGS data can be big, very big, huge! Biology is now a Big Data science There are not many web-based or graphical applications to perform analysis yet, sorry. Most tools developed to work on Linux, many command line programs How to work in NGS? Small datasets (<1TB): workstations Medium sized datasets (<100TB): clusters Big datasets (100TB-20PB): big clusters and/or cloud based solutions Exercises during this course the NGS alignment will be done using the human chromosome 21 as a reference genome. By doing this we can speed up exercises and avoid using too much memory. Under real circumstances, when using the standard reference genome, all the commands are exactly the same Software has been already installed to save time, so you are not expected to download and install all the software it is going to be used. However, it's usually good to learn the basics of software installation in Linux, there is an optional session at the end of the first day for those that want to learn how to install NGS software in a standard Linux

What about you? Brief presentation Who are you? Which is your background? Which is your interest? What do you expect of this course?