Random matrix analysis for gene co-expression experiments in cancer cells

Size: px
Start display at page:

Download "Random matrix analysis for gene co-expression experiments in cancer cells"

Transcription

1 Random matrix analysis for gene co-expression experiments in cancer cells OIST-iTHES-CTSR 2016 July 9 th, 2016 Ayumi KIKKAWA (MTPU, OIST)

2 Introduction : What is co-expression of genes? There are 20~30k genes in human DNA. They are both coding or non-coding genes. Complex netwoks of various transcripts. Gene Interaction network (regulatory network) Protein-Protein interactions. mrna, Non-coding RNA, Micro RNA, etc., Transcriptomes System biology ØJonsson,P.F. and Bates,P.A. (2006) Global topological features of cancer proteins in the human interactome. Bioinformatics, 22,

3 The microarray experiments to gene interaction network NCBI GEO GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. Series 70,997 Platforms 16,042 Samples 1,858,012 More than 10k gene expression in a single assay. Meta-analysis over many experiments is possible. The gene interaction network should change its topology in various cellular states including disease. The Bayesian inferred gene interaction network algorithm. (SiGN-NNSR)

4 The cancer gene interaction database: TCNG The Cancer Network Galaxy (TCNG) Nonparametrix Bayesian network algorithm (SiGN) Experimental Data Sample Sample n Learning Bayesian network Bayesian network 1 Ø Y., Tamada et al. Estimating genomewide gene networks using nonparametric bayesian network models on massively parallel computers. IEEE/ACM Trans. Comput. Biol. Bioinforma. 8, (2011). K : 京 Riken supercomputer Gene1 Gene2 Gene1 Gene2 Bayesian network 2 Based on 256 GEO datasets. Total nodes = Total edges ~ 16M

5 RMT analysis for gene interaction The random matrix theory (RMT) can be applied to various biological networks and we have studied the protein-protein interaction (PPI) networks previously. In many organisms, PPI network shows the universal behavior. The nearest neighbor level (NNL) spacing distribution P(s) shows the Wigner distribution. The important feature of this level statistics is that the eigenvalues (levels) of the adjacency matrix repel each other. This is compared to the opposite case where the levels have no correlation mutually and the distribution behaves as Poisson distribution. The difference of the gene networks between the normal and disease cells is very important. We apply RMT in cancer gene network in order to study whether there is distinctive topological behavior in cancer cells.

6 The Work flow

7 The statistics of the TCNG data Number of inferred edges Number of samples Frequency (edge attribute) : Edge attribute calculated by SiGN-BN NNSR. It represents the frequency of the edge estimated during the iterations of the NNSR algorithm. The range of the value is from 0 to 1. By the default setting, an edge with Freq greater than 0.2 is regarded as being estimated. You can consider this value as the confidence of the estimated edge. This does not represent the accuracy nor the strength of the edge.

8 Poisson to WD distribution change due to the network size #236 (GSE7904) 51 samples 8000 nodes, 32,124 edges #165(GSE29013) 50 samples 8000 nodes, 51,702 edges

9 Poisson to WD distribution change due to the confidence factor of the edges #18 (GSE11135) 204 samples, 21,001 edges #26 (GSE12276) 204 samples, edges

10 #92: 111 samples, 26,717 edges

11 Summary i. From the view point of RMT, we have observed universal behaviors for gene interaction network in cancer cells with the data from the TCNG database. ii. The NNS distribution for gene interaction matrix changes from Poisson distribution to Wigner distribution when the network size is enlarged. iii. The NNS distribution change from P to W is also observed when the confidence factor of inferred edges are strict. iv. As far as our study, the Poisson distribution has been observed only in the cancer related molecular networks yet. (PPI or gene interaction networks).