Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26

Size: px
Start display at page:

Download "Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26"

Transcription

1 Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on Dr. Yanjun Qi 2017/11/26

2 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 2

3 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 3

4 Machine Learning is Changing the World Apple Siri / Amazon Echo Many more! 4

5 Challenge of data explosion Molecular signatures of tumor / blood sample Signs & Symptoms Tradi?onal Approaches Pa?ent Medical History & Demographics Public Health Data Gene?c Data Mobile medical sensor data Medical Images Data-Driven Approaches Machine Learning 5

6 What is Machine Learning? Training Input : Output: X Y Func2on f(x) 6

7 What is Machine Learning? Training Tes2ng Func2on Input : X Output: Y f(x) 7

8 [Example:] What is Machine Learning? Output: CAR Inputs: Training Func2on f(x) 8

9 [Example:] What is Machine Learning? Tes2ng Func2on f(x) CAR Yes (+1) 9

10 [Example:] Classifica;on task in Machine Learning Class: Car Tes2ng Func2on f(x) YES NO (+1) (-1) 10

11 Sequen;al Data Sequen2al Data: Strings, signals etc. This food is not good. 11

12 [Example:] Classifica;on of Sequen;al Data This Food is not good. X Func2on f(x) NO (-1) 12

13 State-of-the-art Machine Learning - Deep Neural Networks (DNN) Dog DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi 13 University of Virginia

14 State-of-the-art Machine Learning - Deep Neural Networks (DNN) Dog DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi 14 University of Virginia

15 State-of-the-art Machine Learning - Deep Neural Networks (DNN) Dog ATGCGATCAAGTCTG Protein Binding Site DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi 15 University of Virginia

16 Deep Neural Networks (DNN) X Y f 1 (. ) f 2 (. ) f 3 (. ) f 4 (. ) Y=f 1 (X) f 2 f 4 (f (f 13 (X)) (f 2 (f 1 (X)))) 16

17 DeMo Dashboard - Lanchan?n, Singh, Wang, & Qi University of Virginia Dog ATGCGATCAAGTCTG Protein Binding Site

18 Accurate Our Goals DNNs are hard to Interpret Understandable 18

19 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 19

20 Biology in a Slide Transcrip2on Transla2on DNA RNA PROTEIN CELL ORGANISM 20

21 Genome Organiza;on and Gene Regula;on Level 1 Level 2 Level 3 Regulatory Elements Genes Promoters Enhancers Chroma?n Structure Histone Modifica?ons DNA methyla?on Chroma?n remodeling Nuclear Architecture Chromosomal organiza?on Long range interac?ons Cellular Phenotype (adapted from Babu et al., 2008)

22 Level 1 Level 2 Regulatory Elements Genes Promoters Enhancers Chroma?n Structure Histone Modifica?ons DNA methyla?on Chroma?n remodeling ENCODE Project (2003-Present) Describe the func?onal elements encoded in human DNA

23 Level 1 Regulatory Elements Genes Promoters Enhancers Level 2 Chroma?n Structure Histone Modifica?ons DNA methyla?on Chroma?n remodeling ENCODE Project (2003-) Roadmap Epigene2cs Project (REMC, 2008-) Describe the func?onal elements encoded in human DNA To produce a public resource of epigenomic maps for stem cells and primary ex vivo?ssues selected to represent the normal counterparts of?ssues and organ systems frequently involved in human disease.

24 Current Available Large-Scale Data about Gene Transcrip2on TF Binding DNA Signals Segments on Gene Expression Genomes ATGCGATCAAGTCTG Histone Modifica?on Signals 24

25 Two Important Data-Driven Computa2onal Tasks TF Binding This Talk DNA Signals Segments on Gene Expression Genomes ATGCGATCAAGTCTG Histone Modifica?on Signals 25

26 Accurate Understandable One main aim of such data analysis is to understand data and to discover knowledge. 26

27 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on ² AFen?veChrome for understanding gene regula?on by selec?ve afen?on on chroma?n 27

28 Chroma;n Control Gene Expression 28

29 Transcrip;on Factor Binding => Gene Transcrip;on Gene Transcrip?on Factors TF ATCGCGTAGCTAGGGATGACAGACACACATAATGT Gene DNA 29

30 Histone Modifica;ons (HM) TF Gene DNA Gene Histone 30

31 Histone Modifica;on and Gene Transcrip;on Transcrip?on Factor (TF) Gene Transcrip?on Histone Modifica?on (HM) 31

32 Histone Modifica;on and Gene Transcrip;on Transcrip?on Factor (TF) Gene Transcrip?on Histone Modifica?on (HM)? 32

33 Histone Modifica;on and Gene Transcrip;on Transcrip?on Factor (TF) Gene Transcrip?on Histone Modifica?on (HM)? 33

34 Why Studying [HM => Gene Expression]? Epigenomics: Study of chemical changes in DNA and histones (without altering DNA sequence) Inheritable and involved in regula?ng gene expression, development,?ssue differen?a?on and suppression Modifica?on in DNA/histones (changes in chroma?n structure and func?on) => influence how easily DNA can be accessed by TF Epigenome is dynamic Can be altered by environmental condi?ons Unlike gene?c muta?ons, chroma?n changes such as histone modifica?ons are poten?ally reversible => Epigenome Drug for Cancer Cells?

35 Study what HMs affect which genes in what cells? Gene A Gene B HM1 HM2 HM3 HM1 HM2 HM3 DNA 35

36 Task Formula;on Predic2on Task Input : Histone Modifica?on Signals HM1 HM2 Gene DNA HM3 HM4 HM5 Output : Gene ON/OFF 36

37 Input Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA HM3 HM4 HM5 37

38 Input Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA HM3 HM4 HM5 38

39 Challenge Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA HM3 HM4 HM5?????? 39

40 Challenge Histone Modifica?on Signals HM1 HM2 Transcrip?on Start Site Gene DNA? HM3 HM4?? HM5??????? 40

41 Computa;onal Challenge Input Output Gene HM1 DNA Gene? HM2 DNA Gene? HM5 DNA??? Gene Search Space per Gene = 2 100x5 41

42 Related Work Linear Regression, SVM, Random Forest Gene ON/OFF 42

43 Related Work Linear Regression, SVM, Random Forest Gene ON/OFF 43

44 Drawback of Related Works Linear Regression, SVM, Random Forest Gene ON/OFF 44

45 Deep Learning to Rescue: Deep Neural Network (DNN) 45

46 Deep Learning to Rescue: Input HUMAN TREE SKY Output Park HM1 DNA Gene HM5 DNA 46

47 Accurate DeepChrome: Convolu?on Neural Network (CNN) Understandable R. Singh, Jack Lanchan?n, Gabriel Robins, and Yanjun Qi. DeepChrome: Deep-learning for predic?ng gene expression from histone modifica?ons. Bioinforma)cs. (ECCB) (2016) 47

48 Data Transcrip?on Start Site HM1 HM2 HM3 HM4 Gene A Gene A Gene A Gene A HM bp bp Bin # Bin # HM1 HM2 HM3 HM4 HM5 Gene A 48

49 DeepChrome-Convolu;onal Neural Network (CNN) HM1 HM2 5. Sotmax HM3 HM4 HM5 X 1. Convolu?on 2. Max Pooling 3. Dropout 4. Mul?-Layer Perceptron f(x) -1/+1 y N samp L = loss( f ( X (n) ), y (n) )! n=1

50 DeepChrome-Convolu;onal Neural Network (CNN) HM1 HM2 5. Sotmax HM3 L HM4 HM5 X 1. Convolu?on 2. Max Pooling 3. Dropout 4. Mul?-Layer Perceptron Back-propaga2on: Θ Θ η L! Θ

51 Experimental Setup Cell-types: 56 Input (HM): ChIP-Seq Maps (REMC) Output (Gene Expression): Discre?zed RNA-Seq (REMC) Histone Mark H3K27me3 H3K36me3 H3K4me1 H3K4me3 H3K9me3 Func2onal Category Repressor Promoter Distal Promoter Promoter Repressor Baselines: Support Vector Classifier (SVC) and Random Forest Classifier (RFC) Training Set 6601 Genes Valida?on Set 6601 Genes Test Set 6600 Genes 51

52 Results: Accuracy AUC Score Ø First deep learning implementa?on for gene expression predic?on. Ø But hard to interpret. 56 Cell-types DeepChrome SVC RFC 52

53 Accurate DeepChrome AFen?veChrome: understanding gene regula?on by selec?ve afen?on on chroma?n Understandable Goal: one DNN both accurate and interpretable Ritambhara Singh, Jack Lanchan?n, Arshdeep Sekhon, Yanjun Qi, "AFend and Predict: Understanding Gene Regula?on by 53 Selec?ve AFen?on on Chroma?n", to appear at (NIPS-17 )

54 Interpretability by Aben;on Input Aben2on Mechanism Output Park HM1 DNA Gene Gene HM2 DNA Gene 54

55 HM1 HM2 Aben;ve Chrome HM5 Ritambhara Singh, Jack Lanchan?n, Arshdeep Sekhon, Yanjun Qi, "AFend and Predict: Understanding Gene Regula?on by 55 Selec?ve AFen?on on Chroma?n", to appear at (NIPS-17 )

56 Two Levels of Aben;on 56

57 Input LOCAL HM1 HM2 HM5 LEVEL 57

58 Input LOCAL HM1 HM2 HM5 LEVEL 58

59 Overview Input LOCAL HM1 HM2 HM3 LEVEL 59

60 Overview GLOBAL LEVEL Input LOCAL HM1 HM2 HM3 LEVEL 60

61 Overview Gene Output GLOBAL LEVEL Input LOCAL HM1 HM2 HM5 LEVEL 61

62 Recurrent Neural Network (RNN) HM1 RNN RNN Recurrent Neural Network (RNN) RNN RNN 62

63 Aben2on Mechanism W AFen?on Mechanism AFen?on Mechanism AFen?on Mechanism AFen?on Mechanism AFen?on Mechanism HM1 RNN RNN Recurrent Neural Network (RNN) RNN RNN 63

64 Versus Baselines 64

65 Experiments: Predic;on Performance Same setup as DeepChrome AFen?veChrome is as accurate as (slightly befer than) DeepChrome 65

66 Experiments: Interpretability Local-level (HM-level) AFen?on Global-level (HM interac?ons) AFen?on 66

67 (1) Visualiza;on of Local Aben;on Weights (Learned from Data) Ø Addi?onal signal - H3K27ac (H-Ac?ve) from REMC Ø Average local afen?on weights of gene=on correspond well with H- ac?ve Ø Indica?ng AFen?veChrome is focusing on the correct bin posi?ons 67

68 (2) Visualiza;on of Global Aben;on Weights (Learned from Data) Ø An important differen?ally regulated gene (PAX5) across three blood lineage cell types: Ø H1-hESC (stem cell), Ø GM12878 (blood cell), Ø K562 (leukemia cell). Ø Trend of its global weights (beta) Verified through the literature. 68

69 (3) Comparison with State-of-Art Deep-Visualiza;on Methods Correla?on between local-level (HM-level) afen?on weights and the addi?onal signal - H3K27ac (H-Ac?ve) from REMC 69

70 Summary code available at: deepchrome.org Ø Aben2ve DeepChrome Both accurate and interpretable Novel implementa?on of deep afen?on mechanism Importance analysis at both HM and HM-HM level Accurate Understandable 70

71 References Ritambhara Singh, Jack Lanchan?n, Gabriel Robins, and Yanjun Qi. DeepChrome: Deep-learning for predic?ng gene expression from histone modifica?ons. Bioinforma)cs. (ECCB) (2016) Ritambhara Singh, Jack Lanchan?n, Arshdeep Sekhon, Yanjun Qi, "AFend and Predict: Understanding Gene Regula?on by Selec?ve AFen?on on Chroma?n", to appear at NIPS (2017 ) 71

72 Acknowledgements Ritambhara Singh Jack Lanchan?n Arshdeep Sekhon UVA Department of Biochemistry and Molecular Gene2cs: Dr. Mazhar Adli 72

73 Thank you 73