Multi-omics in biology: integration of omics techniques

Similar documents
Single cell RNA-sequencing: advantages and limitations

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

What Can the Epigenome Teach Us About Cellular States and Diseases?

NGS Approaches to Epigenomics

Introduction to the UCSC genome browser

Complete Sample to Analysis Solutions for DNA Methylation Discovery using Next Generation Sequencing

2/19/13. Contents. Applications of HMMs in Epigenomics

Applications of HMMs in Epigenomics

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

About Strand NGS. Strand Genomics, Inc All rights reserved.

Introduction to human genomics and genome informatics

What we ll do today. Types of stem cells. Do engineered ips and ES cells have. What genes are special in stem cells?

Do engineered ips and ES cells have similar molecular signatures?

2/10/17. Contents. Applications of HMMs in Epigenomics

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

GREG GIBSON SPENCER V. MUSE

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Genetics and Bioinformatics

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Welcome to the NGS webinar series

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Biology 644: Bioinformatics

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Cancer Genetics Solutions

Bioinformatics Advice on Experimental Design

Introduction to genome biology

Introduction to NGS analyses

G E N OM I C S S E RV I C ES

Galaxy Platform For NGS Data Analyses

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Introduction to BIOINFORMATICS

Introduction to genome biology

GATCGTGCACGATCTCGGCAATTCGGGATGCCGGCTCGTCACCGGTCGCT

ChIP-seq and RNA-seq. Farhat Habib

Satellite Education Workshop (SW4): Epigenomics: Design, Implementation and Analysis for RNA-seq and Methyl-seq Experiments

Impact of gdna Integrity on the Outcome of DNA Methylation Studies

Proteogenomics Workflow for Neoantigen Discovery

WORKSHOP. Transcriptional circuitry and the regulatory conformation of the genome. Ofir Hakim Faculty of Life Sciences

02 Agenda Item 03 Agenda Item

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Introduction to RNA-Seq in GeneSpring NGS Software

Machine Learning. HMM applications in computational biology

Ecological genomics and molecular adaptation: state of the Union and some research goals for the near future.

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Go to Bottom Left click WashU Epigenome Browser. Click

Bio5488 Practice Midterm (2018) 1. Next-gen sequencing

Frumkin, 2e Part 1: Methods and Paradigms. Chapter 6: Genetics and Environmental Health

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers


GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Deep Sequencing technologies

ChIP-seq and RNA-seq

Pioneering Clinical Omics

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Course Presentation. Ignacio Medina Presentation

Computational Challenges of Medical Genomics

FFPE in your NGS Study

Next-generation sequencing technologies

Sanger vs Next-Gen Sequencing

DNA METHYLATION RESEARCH TOOLS

The Human Genome and its upcoming Dynamics

Genomic resources. for non-model systems

Introduction to Bioinformatics

DIAMANTINA INSTITUTE for Cancer, Immunology and Metabolic Medicine

Lecture 5: Regulation

Discovering similar (epi)genomics feature patterns in multiple genome browser tracks

The ENCODE Encyclopedia. & Variant Annotation Using RegulomeDB and HaploReg

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

Single-cell sequencing

RNA-SEQUENCING ANALYSIS

Genomic data visualisation

NimbleGen Arrays and LightCycler 480 System: A Complete Workflow for DNA Methylation Biomarker Discovery and Validation.

MODULE TSS1: TRANSCRIPTION START SITES INTRODUCTION (BASIC)

Introduction to Bioinformatics and Gene Expression Technology

Supplemental Methods. Exome Enrichment and Sequencing

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

ChIP-seq analysis 2/28/2018

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Why DNA Isn t Your Destiny

Next- genera*on Sequencing. Lecture 13

Agilent NGS Solutions : Addressing Today s Challenges

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

What is Bioinformatics?

PeCan Data Portal. rnal/v48/n1/full/ng.3466.html

Proteogenomics. Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Centro Nacional de Análisis Genómico. Where are the Bottlenecks of Genome Analysis Today? Teratec. Ecole Polytechnique, Palaiseau, F.

Whole genome sequencing in the UK Biobank

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Supplemental Figure 1.

Microbial Metabolism Systems Microbiology

Measuring Protein-DNA interactions

Permutation Clustering of the DNA Sequence Facilitates Understanding of the Nonlinearly Organized Genome

Experimental validation of candidates of tissuespecific and CpG-island-mediated alternative polyadenylation in mouse

DNA METHYLATION NH 2 H 3. Solutions using bisulfite conversion and immunocapture Ideal for NGS, Sanger sequencing, Pyrosequencing, and qpcr

Genomic solutions for complex disease

Targeted RNA sequencing reveals the deep complexity of the human transcriptome.

Transcription:

31/07/17 Летняя школа по биоинформатике 2017 Multi-omics in biology: integration of omics techniques Konstantin Okonechnikov Division of Pediatric Neurooncology German Cancer Research Center (DKFZ)

2 Short prefix: use scientific language Manuscript standard Get used to it: Next Generation Sequencing = Секвенирование нового поколения ~ НГС Paired-end reads = парные прочтения ~ риды Single nucleotide polymorphism = Однонуклеотидный полиморфизм ~ СНиП Presentation slides are in English

3 Multi-omics origins Main breakthrough in molecular biology: ability to get signals from the majority of cell components

4 Available methods PCR - > DNA, RNA Sanger sequencing -> DNA, RNA Arrays -> DNA, RNA, methylation Spectrometry -> proteins High Throughput Sequencing -> large variety of cell areas

5 Complex process example: cancer Changes in functionality of a cell Cause: unfixed DNA damage Mutations Insertions, deletions Translocations

6 Complex process example: cancer Development from normal cell to malignant Formed oncogenes Broken repair and tumor suppressor genes Changes in epigenomic activity How to find the cause and understand the process?

7 Multi-omics approach Integeration of different detected changes : Genotype <-> phenotype Cluster and form groups Find possible causative factors Focus on HTS: Genomics Transcriptomics Epigenomics Y. Hasin et al Genome Biology 5.2017

8 High Throughput Sequencing Protocol properties Base analysis Alignment Assembly J. M. Churko et al CIRCRESAHA.113.300939

9 Genomics Whole genome/exome sequencing Single nucleotide polymorphisms/variants Population control Structural variations (InDels) Copy number changes Robinson, J. T., et al. Nature biotechnology 29.1 (2011): 24-26.

10 WGS/ES properties Procedure control (read length, single/paired, etc ) Coverage Exome : 100-200x, genome 20-40x Results annotation Genomic properties (gene type, known, etc) Amino acid impact and conservation Population frequency, clinical indication,. Meyerson M, Gabriel S, Getz G. Nat Rev Genet. 2010 Oct;11(10):685-96. Review.

11 Transcriptomics Microarrays, RNA-sequencing Gene expression analysis Various types of RNA: small, non-coding,. Alternative splicing and isoforms Fusion genes, chimeric transcscripts K. Okonechnikov et al 2016 PLOS One.

12 Gene expression profiling Specific quality control <- important for each type of data Covered genes, 5-3 bias, etc.. Normalization: RPKM, size factor, etc Batch effect: source of samples

13 Epigenomics Changes in gene expression without changes in genome http://www.roadmapepigenomics.org

14 DNA methylation Epigenetic mark 5-methylcytosine (5mC), affects up to 80% of CpGs in genome Highly active in promoters Detection is performed on CpG cites Day and Sweatt, Nat Neuro, 2010

15 Methylation marks 450k arrays, whole genome bisulfite sequencing Genomic imprinting Differentially methylated regions DNA methylation valleys K. W. Pajtler, Cancer Cell (2015) V27, I5, Pages 728-743

16 Histone modifications Promoters: DNA segments of TF binding sites Enhancers: transcriptional activation/deactivation of target genes Core modifications: H3K4me3 - active promoters, H3K4me1, H3K27ac enhancers, A. Mora et al Brief Bioinform (2015) doi: 10.1093/bib/bbv097

17 Histone activity variance ChIP-sequencing (TFs, RNA Pol II, H3K27ac, ): Detect active promoters and enhancers Investigate the transcription factor activity Find novel epigenetic patterns P.J. Park Nat Rev Genet. 2009 Oct; 10(10): 669 680

18 Chromatin interactions 3C, 4C, HiC sequencing Chromosome conformation capture Spatial organization of genome: connection matrix Topologically Associated Domains J Dekker, MA Marti-Renom, LA Mirny - Nature reviews. Genetics, 2013

19 Initial data analysis Main task: discovery of possible markers and functional points of biological process Standard pipelines: Command line tools task specific (learn, discuss, check review manuscripts) Visualization (IGV, UCSC genome browser, CIrcos ) Workflow management (Galaxy, Unipro UGENE,..) Public databases and online resources: Ensembl, GENCODE - stabilize for all data types ROADMAP <depends on research focus >

20 Summarizing results per sample Data availability Specific formats # Group ID WGS WGBS RNA-seq smallrnaseq H3K27me 9 WT AK_417 ICGC_GBM9 GBM9 ICGC_GBM9 ICGC_GBM9 DKFZ_0948 16 K27M ICGC_GBM16 ICGC_GBM16 ICGC_GBM16 ICGC_GBM16 DKFZ_1072 DKFZ_1184 27 K27M ICGC_GBM27 ICGC_GBM27 GBM27 ICGC_GBM27 ICGC_GBM27 DKFZ_1427 62 G34R ICGC_GBM62 ICGC_GBM62 GBM62 ICGC_GBM62 63 G34R ICGC_GBM63 ICGC_GBM63 GBM63 ICGC_GBM63 Gene S1 S2 S3 S4 TMPRSS2 0.234 1.2342 3.34 7.23 ERG 0.723 2.34 5.23 0.23 75 G34R 2071 ICGC_GBM75 GBM75 76 G34R AK178 PNET_178 ICGC_GBM76 DKFZ_0532 Gene expression per sample 77 G34R AK40 ICGC_GBM77 AK40 ICGC_GBM77 DKFZ_0392 78 WT Glio_1.3 ICGC_GBM78 Gliom1_3 ICGC_GBM78 DKFZ_0323 Samples overview chr start end Enrichment (nrpm) Type chr6 33986957 34116956 299,9438024 SSSEA chr7 44254138 44335962 223,1921727 SSSEA chr12 3311159 3389177 190,9766377 SSSEA chr17 77404069 77486837 170,4074676 Enhancer chr3 10484586 10560420 116,2546028 Enhancer chr1 17019975 17045474 107,8777923 SSSEA Enhancer signals per sample Sample mutations

21 Multi-omics application: medulloblastoma Highly malignant tumour, mostly in young children Characterized by medical and genetic factors Extremely low mutation rate

22 Clustering to find groups Methods: Hierarchical clustering, Principal Component Analysis, Resources: RNA-seq, methylation (more stable) P. Northcott et al Journal of Clinical Oncology 29, no. 11 2011

23 Combined clustering Combination: RNA-seq + 450k methylation PC2 Similarity Network Fusion PC1 Integrated Clustering results: Clear separation of G3 from G4 Additional subgroups F. M. Cavalli et all Cancer Cell, V1, Issue 6, 2017

24 Cluster specificity based on genomics WGS: Landscape of mutations, indels, etc specific for samples Methylation: Clustering based on t- Distributed Stochastic Neighbor Embedding Coordinates k-means to set groups Clustering verification: genomic patterns to specify the groups P. Northcott et al Nature 547, 311 317 (20 July 2017)

25 Methylation and expression Combination: WGBS + RNA-seq Correlation: regions of methylation with expression of overlapping genes V. Hovestadt et al Nature. 2014 Jun 26;510(7506)

26 Enhancer hijacking Combination: WGS + RNA-seq + ChIP-seq Structural variation in genomic with enhancers -> outlier overexpression of a gene The variation affects enhancers connection Supported by mouse model P. Northcott. et al. Nature 2014

27 Enhancer associated genes Combination: H3K27ac ChIP-seq + RNA-seq + HiC Focus on Topologically Associated Domains (IMR90) The connection of genes and enhancers Lin, C Y., Erkek S. et al. Nature (2016).

28 Transcription factor network Key players: enhancer enriched and dominant in expression transcription factors Lin, C Y., Erkek S. et al. Nature (2016).

29 Summary Multiple experimental methods allow to obtain various signals from cell activity Good example: high throughput sequencing. It allows to obtain various genomics, transcriptomics and epigenomics signals Combination of multiple technique - multi-omics - allows to confirm observations and find possible cause effects Various strategies can be applied for this including multidimensional clustering, correlation, effects explanation etc

30 Спасибо за внимание! Вопросы?