Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26
|
|
- Evangeline Cummings
- 5 years ago
- Views:
Transcription
1 Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on Dr. Yanjun Qi 2017/11/26
2 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 2
3 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 3
4 Machine Learning is Changing the World Apple Siri / Amazon Echo Many more! 4
5 Challenge of data explosion Molecular signatures of tumor / blood sample Signs & Symptoms Tradi?onal Approaches Pa?ent Medical History & Demographics Public Health Data Gene?c Data Mobile medical sensor data Medical Images Data-Driven Approaches Machine Learning 5
6 What is Machine Learning? Training Input : Output: X Y Func2on f(x) 6
7 What is Machine Learning? Training Tes2ng Func2on Input : X Output: Y f(x) 7
8 [Example:] What is Machine Learning? Output: CAR Inputs: Training Func2on f(x) 8
9 [Example:] What is Machine Learning? Tes2ng Func2on f(x) CAR Yes (+1) 9
10 [Example:] Classifica;on task in Machine Learning Class: Car Tes2ng Func2on f(x) YES NO (+1) (-1) 10
11 Sequen;al Data Sequen2al Data: Strings, signals etc. This food is not good. 11
12 [Example:] Classifica;on of Sequen;al Data This Food is not good. X Func2on f(x) NO (-1) 12
13 State-of-the-art Machine Learning - Deep Neural Networks (DNN) Dog DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi 13 University of Virginia
14 State-of-the-art Machine Learning - Deep Neural Networks (DNN) Dog DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi 14 University of Virginia
15 State-of-the-art Machine Learning - Deep Neural Networks (DNN) Dog ATGCGATCAAGTCTG Protein Binding Site DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi 15 University of Virginia
16 Deep Neural Networks (DNN) X Y f 1 (. ) f 2 (. ) f 3 (. ) f 4 (. ) Y=f 1 (X) f 2 f 4 (f (f 13 (X)) (f 2 (f 1 (X)))) 16
17 DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi University of Virginia Dog ATGCGATCAAGTCTG Protein Binding Site
18 Accurate Our Goals DNNs are hard to Interpret Understandable 18
19 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 19
20 Biology in a Slide Transcrip2on Transla2on DNA RNA PROTEIN CELL ORGANISM 20
21 Genome Organiza;on and Gene Regula;on Level 1 Level 2 Level 3 Regulatory Elements Genes Promoters Enhancers Chroma?n Structure Histone Modifica?ons DNA methyla?on Chroma?n remodeling Nuclear Architecture Chromosomal organiza?on Long range interac?ons Cellular Phenotype (adapted from Babu et al., 2008)
22 Level 1 Level 2 Regulatory Elements Genes Promoters Enhancers Chroma?n Structure Histone Modifica?ons DNA methyla?on Chroma?n remodeling ENCODE Project (2003-Present) Describe the func?onal elements encoded in human DNA
23 Level 1 Regulatory Elements Genes Promoters Enhancers Level 2 Chroma?n Structure Histone Modifica?ons DNA methyla?on Chroma?n remodeling ENCODE Project (2003-) Roadmap Epigene2cs Project (REMC, 2008-) Describe the func?onal elements encoded in human DNA To produce a public resource of epigenomic maps for stem cells and primary ex vivo?ssues selected to represent the normal counterparts of?ssues and organ systems frequently involved in human disease.
24 Current Available Large-Scale Data about Gene Transcrip2on TF Binding DNA Signals Segments on Gene Expression Genomes ATGCGATCAAGTCTG Histone Modifica?on Signals 24
25 Two Important Data-Driven Computa2onal Tasks TF Binding This Talk DNA Signals Segments on Gene Expression Genomes ATGCGATCAAGTCTG Histone Modifica?on Signals 25
26 Accurate Understandable One main aim of such data analysis is to understand data and to discover knowledge. 26
27 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 27
28 Chroma;n Control Gene Expression 28
29 Transcrip;on Factor Binding => Gene Transcrip;on Gene Transcrip?on Factors TF ATCGCGTAGCTAGGGATGACAGACACACATAATGT Gene DNA 29
30 Histone Modifica;ons (HM) TF Gene DNA Gene Histone 30
31 Histone Modifica;on and Gene Transcrip;on Transcrip?on Factor (TF) Gene Transcrip?on Histone Modifica?on (HM) 31
32 Histone Modifica;on and Gene Transcrip;on Transcrip?on Factor (TF) Gene Transcrip?on Histone Modifica?on (HM)? 32
33 Histone Modifica;on and Gene Transcrip;on Transcrip?on Factor (TF) Gene Transcrip?on Histone Modifica?on (HM)? 33
34 Why Studying [HM => Gene Expression]? Epigenomics: Study of chemical changes in DNA and histones (without altering DNA sequence) Inheritable and involved in regula?ng gene expression, development,?ssue differen?a?on and suppression Modifica?on in DNA/histones (changes in chroma?n structure and func?on) => influence how easily DNA can be accessed by TF Epigenome is dynamic Can be altered by environmental condi?ons Unlike gene?c muta?ons, chroma?n changes such as histone modifica?ons are poten?ally reversible => Epigenome Drug for Cancer Cells?
35 Study what HMs affect which genes in what cells? Gene A Gene B HM1 HM2 HM3 HM1 HM2 HM3 DNA 35
36 Task Formula;on Predic2on Task Input : Histone Modifica?on Signals HM1 HM2 Gene DNA HM3 HM4 HM5 Output : Gene ON/OFF 36
37 Input Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA HM3 HM4 HM5 37
38 Input Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA HM3 HM4 HM5 38
39 Challenge Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA HM3 HM4 HM5?????? 39
40 Challenge Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA? HM3 HM4?? HM5??????? 40
41 Computa;onal Challenge Input Output Gene HM1 DNA Gene? HM2 DNA Gene? HM5 DNA??? Gene Search Space per Gene = 2 100x5 41
42 Related Work Linear Regression, SVM, Random Forest Gene ON/OFF 42
43 Related Work Linear Regression, SVM, Random Forest Gene ON/OFF 43
44 Drawback of Related Works Linear Regression, SVM, Random Forest Gene ON/OFF 44
45 Deep Learning to Rescue: Deep Neural Network (DNN) 45
46 Deep Learning to Rescue: Input HUMAN TREE SKY Output Park HM1 DNA Gene HM5 DNA 46
47 Accurate DeepChrome: Convolu?on Neural Network (CNN) Understandable R. Singh, Jack Lanchan?n, Gabriel Robins, and Yanjun Qi. DeepChrome: Deep-learning for predic?ng gene expression from histone modifica?ons. Bioinforma)cs. (ECCB) (2016) 47
48 Data Transcrip?on Start Site HM1 HM2 HM3 HM4 Gene A Gene A Gene A Gene A HM bp bp Bin # Bin # HM1 HM2 HM3 HM4 HM5 Gene A 48
49 DeepChrome-Convolu;onal Neural Network (CNN) HM1 HM2 5. Sotmax HM3 HM4 HM5 X 1. Convolu?on 2. Max Pooling 3. Dropout 4. Mul?-Layer Perceptron f(x) -1/+1 y N samp L = loss( f ( X (n) ), y (n) )! n=1
50 DeepChrome-Convolu;onal Neural Network (CNN) HM1 HM2 5. Sotmax HM3 L HM4 HM5 X 1. Convolu?on 2. Max Pooling 3. Dropout 4. Mul?-Layer Perceptron Back-propaga2on: Θ Θ η L! Θ
51 Experimental Setup Cell-types: 56 Input (HM): ChIP-Seq Maps (REMC) Output (Gene Expression): Discre?zed RNA-Seq (REMC) Histone Mark H3K27me3 H3K36me3 H3K4me1 H3K4me3 H3K9me3 Func2onal Category Repressor Promoter Distal Promoter Promoter Repressor Baselines: Support Vector Classifier (SVC) and Random Forest Classifier (RFC) Training Set 6601 Genes Valida?on Set 6601 Genes Test Set 6600 Genes 51
52 Results: Accuracy AUC Score Ø First deep learning implementa?on for gene expression predic?on. Ø But hard to interpret. 56 Cell-types DeepChrome SVC RFC 52
53 Accurate DeepChrome AFen?veChrome: understanding gene regula?on by selec?ve afen?on on chroma?n Understandable Goal: one DNN both accurate and interpretable Ritambhara Singh, Jack Lanchan?n, Arshdeep Sekhon, Yanjun Qi, "AFend and Predict: Understanding Gene Regula?on by 53 Selec?ve AFen?on on Chroma?n", to appear at (NIPS-17 )
54 Interpretability by Aben;on Input Aben2on Mechanism Output Park HM1 DNA Gene Gene HM2 DNA Gene 54
55 HM1 HM2 Aben;ve Chrome HM5 Ritambhara Singh, Jack Lanchan?n, Arshdeep Sekhon, Yanjun Qi, "AFend and Predict: Understanding Gene Regula?on by 55 Selec?ve AFen?on on Chroma?n", to appear at (NIPS-17 )
56 Two Levels of Aben;on 56
57 Input LOCAL HM1 HM2 HM5 LEVEL 57
58 Input LOCAL HM1 HM2 HM5 LEVEL 58
59 Overview Input LOCAL HM1 HM2 HM3 LEVEL 59
60 Overview GLOBAL LEVEL Input LOCAL HM1 HM2 HM3 LEVEL 60
61 Overview Gene Output GLOBAL LEVEL Input LOCAL HM1 HM2 HM5 LEVEL 61
62 Recurrent Neural Network (RNN) HM1 RNN RNN Recurrent Neural Network (RNN) RNN RNN 62
63 Aben2on Mechanism W AFen?on Mechanism AFen?on Mechanism AFen?on Mechanism AFen?on Mechanism AFen?on Mechanism HM1 RNN RNN Recurrent Neural Network (RNN) RNN RNN 63
64 Versus Baselines 64
65 Experiments: Predic;on Performance Same setup as DeepChrome AFen?veChrome is as accurate as (slightly befer than) DeepChrome 65
66 Experiments: Interpretability Local-level (HM-level) AFen?on Global-level (HM interac?ons) AFen?on 66
67 (1) Visualiza;on of Local Aben;on Weights (Learned from Data) Ø Addi?onal signal - H3K27ac (H-Ac?ve) from REMC Ø Average local afen?on weights of gene=on correspond well with H- ac?ve Ø Indica?ng AFen?veChrome is focusing on the correct bin posi?ons 67
68 (2) Visualiza;on of Global Aben;on Weights (Learned from Data) Ø An important differen?ally regulated gene (PAX5) across three blood lineage cell types: Ø H1-hESC (stem cell), Ø GM12878 (blood cell), Ø K562 (leukemia cell). Ø Trend of its global weights (beta) Verified through the literature. 68
69 (3) Comparison with State-of-Art Deep-Visualiza;on Methods Correla?on between local-level (HM-level) afen?on weights and the addi?onal signal - H3K27ac (H-Ac?ve) from REMC 69
70 Summary code available at: deepchrome.org Ø Aben2ve DeepChrome Both accurate and interpretable Novel implementa?on of deep afen?on mechanism Importance analysis at both HM and HM-HM level Accurate Understandable 70
71 References Ritambhara Singh, Jack Lanchan?n, Gabriel Robins, and Yanjun Qi. DeepChrome: Deep-learning for predic?ng gene expression from histone modifica?ons. Bioinforma)cs. (ECCB) (2016) Ritambhara Singh, Jack Lanchan?n, Arshdeep Sekhon, Yanjun Qi, "AFend and Predict: Understanding Gene Regula?on by Selec?ve AFen?on on Chroma?n", to appear at NIPS (2017 ) 71
72 Acknowledgements Ritambhara Singh Jack Lanchan?n Arshdeep Sekhon UVA Department of Biochemistry and Molecular Gene2cs: Dr. Mazhar Adli 72
73 Thank you 73