Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt

Size: px
Start display at page:

Download "Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt"

Transcription

1 Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt Christian Hundt

2 Contents Gene Set Enrichment Analysis (GSEA) Background Algorithmic details cudagsea Performance evaluation

3 GSEA and Bioinformatics High throughput technologies generate large-scale gene expression data sets RNA-Seq Microarrays GSEA uses annotated gene sets to mine a given gene expression matrix MSigDB contains over 10K signatures each containing around 100 gene identifiers on average Typical GSEA study: identify metabolic pathways that are differentially changed in human type-2 diabetes

4 Gene Set Enrichment Analysis Reveals correlation between gene sets and diseases using gene expression data State-of-the-art tool with over 10,000 citations Written in (multi-threaded) Java Highly time consuming analyzing 20,639 genes measured in 200 patients with 4,725 pathways and 1M permutations takes around 1 week with GSEA software on a CPU We present GSEA parallelization on a GPU using CUDA (cudagsea) cudagsea around two orders-ofmagnitude faster than BroadGSEA

5 GSEA Algorithm Gene Ranking Gene expression matrix D obtained from RNA-Seq or Microarray experiments For each gene i and patient j with associated (binary) phenotype C expression value D[i,j] is stored Diseases driven by complex gene interactions simply reporting top-ranked genes produce many false positives Domain experts provides set of genes that might possibly explain observed phenotypes

6 GSEA Algorithm Enrichment score Enrichment score (ES) measure correlation between given gene set S and calculated gene ranking g (i) Report maximum deviation of a running sum (k) Sum increases if we hit a member of S and decreases otherwise How significant is ES = 0.857? p-value calculation using permutation testing

7 GSEA Algorithm Permuation testing

8 GSEA Algorithm Permuation testing

9 GSEA Algorithm - ES ES Histogram of 1,000,000 enrichment scores gained by permuting patient phenotypes Estimate p-value by counting events in both tails Why so many permutations? When testing 1,000 gene sets at significance level p<0.001 we need more than 1,000,000 samples to reject null hypothesis at 1,000 p < (Bonferroni correction)

10 Transpose D to ensure coalesced memory accesses CUDA Parallelization

11 CUDA Parallelization

12 CUDA Parallelization

13 CUDA Implementation Details Support for single-precision and double-precision Resulting matrix of enrichment scores (#gene sets x #permutations) can be large e.g. 5K x 1M x 8B = 40GB p-value estimation, Family-wise error rate (FWER), normalized enrichment score (NES) computation can be accomplished on the GPU with (sum/max) reduction kernels without the need for storing this matrix False discovery rate (FDR) computation this matrix is transferred to the CPU for post-processing

14 cudagsea Features Reading data sets directly in Broad Institute-compatible file formats Supporting several local deviation measures Mean-based measures (difference/quotient/log-quotient of means) Mean and standard deviation-based measures (signal to noiseratio, t-tests, one/two-pass estimation) Numerically stable summation schemes for local measures and ES (Kahan etc.) Package for the R framework and standalone application Multi-threaded CPU version in C++ using OpenMP

15 Performance Evaluation GSE19429 dataset collapsed to 20,639 gene symbols; 200 patients (183 cases + 17 controls) Hallmark: 50 gene sets MSigDB 5.1 smallest gene set collection GeForce Titan X (single precison) / Tesla K40c (double precision, ECC off), CUDA core Xeon E5-2660v3@2.60GHz, 20 Threads, Ubuntu 14.04, gcc 4.8.4, 64-bit OpenJDK BroadGSEA v.2.2.2

16 Performance Evaluation GSE19429 dataset collapsed to 20,639 gene symbols; 200 patients (183 cases + 17 controls) C2: 4726 gene sets MSigDB 5.1 largest gene set collection GeForce Titan X (single precison) / Tesla K40c (double precision, ECC off), CUDA core Xeon E5-2660v3@2.60GHz, 20 Threads, Ubuntu 14.04, gcc 4.8.4, 64-bit OpenJDK BroadGSEA v.2.2.2

17 Conclusion High-throughput technologies establish the need for scalable bioinformatics tools that can process largescale gene expression data sets CUDA is a suitable technology to address this need cudagsea on one GPU achieves around two orders-ofmagnitude speedup versus BroadGSEA on a CPU analyzing 20,639 genes measured in 200 patients with 4,726 pathways and 1M permutations takes around 1 week with GSEA on a Xeon E5-2660v3 CPU while less than 1 hour on a GeForce Titan X Source code available at: Group Website:

18 Thank you! Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt, Christian Hundt Institute of Computer Science Johannes Gutenberg University Mainz {bertil.schmidt,