What Can the Epigenome Teach Us About Cellular States and Diseases?

What Can the Epigenome Teach Us About Cellular States and Diseases? (a computer scientist s view) Luca Pinello

Outline Epigenetic: the code over the code What can we learn from epigenomic data? Resources for epigenomic data & analysis

Human Genome Project We are what we are thanks to our genes Genes determine: Cellular state Disease Can be used to make diagnosis and design therapies First draft of our genome became available in 2001

Genes are not sufficient to explain the complexity of an organism

We need to learn to read the non-coding part of the genome adapted from: http://www.opont.hu/hirek3.php?k_hirazn=27234&k_hirfl=201112&k_hirkat=7 Gene Regulatory region?

Transcription Factors Transcription factors are proteins that control which genes are turned on or off in the genome Their activity determines how cells function and respond to cellular environments We have many TFs (>1000)

OK, how can I find my spot? Single and multiple alignments, motif search tools

DNA: a protein parking lot organized by sequences? A fundamental question is: is there a natural order dictated by the sequence, or are the binding locations of a protein dictated by other factors?

The Epigenetic revolution

Interest in Epigenetic is still raising

Epigenetics Epigenetic can be used to describe anything other than DNA sequence that influences the development of an organism.

Same genotype different phenotypes inheritance of phenotype is presumably based on epigenetic modifications of the IAP that may include DNA methylation or chromatin packaging Different diets in otherwise identical mice can determine glucose intolerance and obesity risk in offspring

I told you! Lamarck s theory of inheritance of acquired characters Darwin s theory of Natural Selection You can inherit something beyond the DNA sequence!

Epigenetic and gene regulation

Epigenetic and chromatin structure All the cells (almost) of our body share the same genome but have very different gene expression programs. Adapted from: http://jpkc.scu.edu.cn/ywwy/zbsw(e)/edetail12.htm

Chromatin Structure: it s not static! Adapted from: http://alexnabaum.blogspot.com/

The code over the code The chromatin structure and the accessibility are mainly controlled by: 1. Nucleosome positioning 2. DNA methylation 3. Histone modifications Adapted from: The Cell Biology of Stem Cells (2010)

Histone Modifications Specific histone modifications or combinations of modifications confer unique biological functions to the region of the genome associated with them: Gene Enhancer Adapted from Turner, Cell 2002 Heterochromatin

Histone Modifications are not static

Epigenetic (part of) the control logic of the software

Sequence them all! ChIP-seq Transcription Factors Histone modifications, nucleosomes Chromatin remodelers Bisulfite-seq DNA Methylation DNASE-seq Open Chromatin RNA-seq Gene Expression

Epigenetic and diseases? De-regulation of chromatin regulators/modifiers and chromatin structure Use epigenetic information to annotate genetic variants involved in disease

The problem: What are the functional mechanisms underlying genetic variants and epigenetic alterations associated with complex traits and diseases? Genetic Variation Chromosome Rearrangements Insertion Deletions Mutations SNP Non Coding RNA Disease Chromatin Regulators Nucleosome Positioning Dysregulation Histone Modifications Dysregulation DNA Methylation Dysregulation Epigenetic Variation

Epigenetic and Disease Deregulation of chromatin remodelers, modifiers and aberrant pattern of histone modifications Chromatin instability: although epigenetic changes do not alter the sequence of DNA, an altered chromatin can facilitate mutations and erroneous recombination Hyper-methylation of tumor suppressor genes, hypomethylation or hypo-acethylation of oncogenes Loss of Imprinting: activation of the normally silenced allele of an imprinted gene.

Genetic and Epigenetic variation in Cancer Transcription factors Chromatin regulators Suvà et al. Science 2013

GWAS: you can find variants associated with different diseases but

We need to look into the junk Although 88% of trait/disease-associated SNPs (TASs) were intronic (45%) or intergenic (43%), TASs were not overrepresented in introns and were significantly depleted in intergenic regions

Where should we look? adapted from: Paul et al. Bioessays 2014 Correlations between close variants make it difficult to pinpoint the causal one/s.

Where should we look? We can use external functional annotations: Conservation Enhancers Open Chromatin... adapted from: Paul et al. Bioessays 2014 Which one should we use? In which cell type?

Exploit epigenetic variability to highlight functional regions and regulators

Where to focus? 31

Exploit the cross cell-type variability to find interesting regions Cell type 2: What s that? Cell type 1: Boring Cell type 3: Boring

DHS and genetic variants Tissue-selective enrichment of diseaseassociated variants within DHSs Disease-associated variants systematically perturb transcription factor recognition sequences

DHS and partitioning heritability of regulatory variants Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of from imputed SNPs (5.1 enrichment; p = 3.7 10 17 )

Exploit the cross cell-type variability to find interesting regions GOAL Find non-redundant cell type specific regulators and functional regions We used 19 ChIP-seq datasets for H3K27me3 from the ENCODE project and validated a novel TF in blood development.

Haystack Pipeline Using this pipeline we predicted and experimentally validated regions and novel transcription factors important in blood development

Haystack integrative analysis H3k27ac and gene expression of lymphoblastoid lines from 19 individuals Data from Kasowski et al, Science 2014

Use Haystack on your data! Exploiting the variability to find non-redundant cell type/sample specific regulators and functional regions A Python package called HAYSTACK implements our pipeline github.com/lucapinello/haystack hub.docker.com/r/lucapinello/haystack_bio/

We have many histone modifications Idea: We need a way to summarize the combinatorial patterns of multiple histone marks

http://compbio.mit.edu/chromhmm/ Scaling up: Chromatin States Chromatin states are defined based on different combinations of histone modifications and correspond to different functional regions The goal is to segment the genome into biologically meaningful units.

How can we learn the combinatorial code? ChromHMM quantifies the presence or absence of each mark in bins of fixed size 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1.... 1 1 0 1 Genomic sequence

ChromHMM and segmentation

Chromatin states and diseases Intersection of strong enhancer states with disease-associated SNPs from GWASs shows significant enrichment in relevant cell types

Roadmap Epigenomics

Resources

Roadmap Epigenomics portal http://www.roadmapepigenomics.org

ENCODE portal https://www.encodeproject.org/

Blueprint epigenome http://www.blueprint-epigenome.eu BLUEPRINT focuses on haematopoietic cells from healthy and diseased individuals

International Human Epigenome Consortium IHEC makes available comprehensive sets of reference epigenomes relevant to health and disease

Haploreg www.broadinstitute.org/mammals/haploreg/haploreg.php

Regulome DB http://www.regulomedb.org/

http://screen.umassmed.edu/

Interesting directions Single cell epigenomics Genome editing to uncover other uncharacterized/unmarked regulatory elements? (see for example Rajagopa et al Nat Biotech 2016) Epigenetic Wide Association Studies (EWAS)

pinellolab.org Computational positions available* * Boats not included