Accelerating Genomic Computations 1000X with Hardware

Similar documents
SWAMP: Smith-Waterman using Associative Massive Parallelism

Figure S4 A-H : Initiation site properties and evolutionary changes

SYMPOSIUM March 22-23, 2018

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Database Searching and BLAST Dannie Durand

Transcription factor binding site prediction in vivo using DNA sequence and shape features

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Accelerate High Throughput Analysis for Genome Sequencing with GPU

HiSeqTM 2000 Sequencing System

Accelerating Motif Finding in DNA Sequences with Multicore CPUs

Creation of a PAM matrix

Introduction to Bioinformatics

Dynamic Programming Algorithms

Rapid Parallel Genome Indexing using MapReduce

Addressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC

Sizing SAP Central Process Scheduling 8.0 by Redwood

Using FPGAs to Accelerate Neural Network Inference

Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2

Genomic Data Is Going Google. Ask Bigger Biological Questions

Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Chapter 10: Gene Expression and Regulation

Read Mapping and Variant Calling. Johannes Starlinger

What about streaming data?

High-yield, Scalable Library Preparation with the NEBNext Ultra II FS DNA Library Prep Kit

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

Why learn sequence database searching? Searching Molecular Databases with BLAST

Genome Sequence Assembly

Plasmodium vivax. (Guerra, 2006) (Winzeler, 2008)

Course Overview: Mutation Detection Using Massively Parallel Sequencing

HPC Analytics in the Era of Big Data J. Robert Michael, PhD Sr. Software Engineer St. Jude Children s Research Hospital

The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data

Next-Generation Sequencing. Technologies

Optimize the Performance of Your Cloud Infrastructure

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

Exploring the Genetic Basis of Congenital Heart Defects

The Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience

Designing High Thermal Conductive Materials Using Artificial Evolution MICHAEL DAVIES, BASKAR GANAPATHYSUBRAMANIAN, GANESH BALASUBRAMANIAN

Supplementary Figure 1

Mike Strickland, Director, Data Center Solution Architect Intel Programmable Solutions Group July 2017

Human Genomics, Precision Medicine, and Advancing Human Health. The Human Genome. The Origin of Genomics : 1987

Jack Weast. Principal Engineer, Chief Systems Engineer. Automated Driving Group, Intel

Multiplex Assay Design

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick

RNA-Sequencing analysis

Theory and Application of Multiple Sequence Alignments

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

Introductory Next Gen Workshop

ON USING DNA DISTANCES AND CONSENSUS IN REPEATS DETECTION

Target Enrichment Strategies for Next Generation Sequencing

CS 68: BIOINFORMATICS. Prof. Sara Mathieson Swarthmore College Spring 2018

Targeted Sequencing in the NBS Laboratory

Welcome to the NGS webinar series

Increasing Enterprise Support Demand & Complexity

Molecular Biology: DNA sequencing

Mate-pair library data improves genome assembly

The More the Merrier: Efficient Multi-Source Graph Traversal

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

Review of whole genome methods

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

Smarter Analytics for Big Data

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Massively parallel assemblers for massively parallel DNA sequencers

De Novo Assembly of High-throughput Short Read Sequences

Assembly of Ariolimax dolichophallus using SOAPdenovo2

Mapping strategies for sequence reads

Whole genome sequencing in drug discovery research: a one fits all solution?

GPU Accelerated Molecular Docking Simulation with Genetic Algorithms

Windows Server Capacity Management 101

Machine Learning. Genetic Algorithms

Machine Learning. Genetic Algorithms

Introduction to Bioinformatics and Gene Expression Technologies

E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research

Next Generation Sequencing. Target Enrichment

ERLANGEN REGIONAL COMPUTING CENTER

Haplotype phasing in large cohorts: Modeling, search, or both?

Algorithms for Bioinformatics

University of California at Berkeley College of Engineering Computer Science Division - EECS. Computer Architecture and Engineering Midterm II

CS294: RISE Logistics, Overview, Trends

SHENGYUAN LIU, JUNGANG XU, ZONGZHENG LIU, XU LIU & RICE UNIVERSITY

USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS

ECLIPSE 2012 Performance Benchmark and Profiling. August 2012

Big Data and Real Time Analytics Streams and Hadoop

2015 IBM Corporation

Applications of Big Data in Evidence-Based Medicine

OptimoDE: Programmable Accelerator Engines Through Retargetable Customization

1 st JILP Workshop on Computer Architecture Competitions (JWAC-1):

Bias Scheduling in Heterogeneous Multicore Architectures. David Koufaty Dheeraj Reddy Scott Hahn

Why can GBS be complicated? Tools for filtering, error correction and imputation.

RODOD Performance Test on Exalogic and Exadata Engineered Systems

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation

Mapping by recurrence and modelling the mutation rate

Nature Genetics: doi: /ng Supplementary Figure 1

Lecture 6 Software Quality Measurements

100 Million Subscriber Performance Test Whitepaper:

Sequence Analysis Lab Protocol

Alignment to a database. November 3, 2016

Transcription:

Accelerating Genomic Computations 1000X with Hardware Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering and Computer Science) Prof. Gill Bejerano (Computer Science, Developmental Biology and Pediatrics)

DNA sequencing costs and data explosion 1 st gen Since 2003, genomics data doubling every 7 months! Exabyte data by 2025 100M to 2B genomes to be sequenced! Stephens, Zachary D., et al. "Big data: astronomical or genomical?." PLoS Biology (2015) 2nd gen 3rd gen Storing and processing genome data will exceed the computing challenges of running YouTube and Twitter, biologists warn. [Nature News, 2015] The decreasing cost of sequencing and the increasing number of sequence reads being generated are placing greater demand on the computational resources and knowledge necessary to handle sequence data. [Genome Biology, 2016] 2

Genomic Granular Computing Applications Neonatal ICU 4 million newborns per year in the US alone 1 in 33 newborns with rare genetic conditions admitted to NICU Time of essence for genome-based diagnosis Non-invasively diagnose for over 3,000 rare genetic conditions (e.g. Down Syndrome) Free-floating DNA in blood enormous volume! Prenatal ICU and IVF clinics 3 Liquid Biopsy Early cancer detection life-saving application for millions of individuals Non-invasive circulating tumor DNA Periodic sequencing of healthy individuals - enormous volume!

Patient Diagnosis: Sample-to-answer Patient Reads 1 2 ATGTCGAT CGATACGA GAGTCATC ACTGACGT Read assembly Genome (3 Billion base pairs) REFERENCE:--ATGTCGATGATCCAGAGGATACTAGGATAT- PATIENT: --ATGTCTATGATC--GAGGATATTAGGATAT- Mutations 3 Genome Sequencing Machine Find the causal mutation Long reads (>10Kbp) offer a better resolution of the mutation spectrum but have high error rate (15-40%) >1,300 CPU hours for reference-guided assembly of noisy long reads 14.2M CPU-years for 100M individuals >15,600 CPU hours for de novo assembly of noisy long reads 178M CPU-years for 100M individuals 4

Darwin: A Genomics Co-processor Query (Q) D-SOFT Reference (R) D-SOFT (filter) D-SOFT API Darwin GACT (aligner) GACT API Query (Q) GACT Software Aligner Reference (R) High speed and programmability 1. D-SOFT: Tunable speed/precision to match any error profile 2. GACT: First algorithm with O(1) memory for computeintensive step of alignment allowing arbitrarily long alignments in hardware ideal for long reads 3. First framework shown to accelerate reference-guided as well as de novo assembly of reads in hardware 5

Darwin: 40nm ASIC configuration LPDDR4 (32GB) LPDDR4 (32GB) Software D-SOFT API GACT API Darwin D-SOFT GACT GACT GACT GACT GACT GACT GACT GACT Software (Intel Xeon E5) Algorithm Power (1 thread) BWA-MEM 9.2W GraphMap 10.7W DALIGNER 8.8W Area: 300mm 2 Power: 9W 6

7 GACT algorithm and hardware design

Strategies for long sequence alignment Algorithm Time Space (compute-intensive step) Optimal Smith-Waterman O(mn) O(mn) Y Hirschberg O(mn) O(m+n) Y Banded Smith- Waterman O(n) O(n) N X-drop O(n) O(n) N GACT O(n) O(1) N m, n: sequence lengths m >= n Profound hardware design implications Prior assumptions (hardware) Small upper bound on sequence length n OR Trace-back of alignment in software SLOW! 8

Genome Alignment using Constant-memory Trace-back (GACT) 1. Initialize I curr, J curr in R, Q 2. Form tile of maximum size T around I curr, J curr in R, Q 3. Align tile with trace-back from I curr, J curr with at most (T-D) steps 4. Update I curr, J curr with traceback end coordinates 5. Repeat 2-4 till extension no longer possible Query (Q) * G G T C G T T T Reference (R) * G G C G A C T T T Tile 1 Tile 3 T = 5, D=2 Tile 2 (I curr, J curr ) (I curr, J curr ) Optimal Alignment G G - C G A C T T T G G T C G - - T T T Score = 11 Alignment G G - C G A C T T T G G T C G - - T T T Score = 11 9

GACT empirically provides optimal alignments } GACT tile size T=400 } GACT compared to optimal Smith-Waterman for 200,000 10Kbp sequences with 4 different error rates: 10%, 20%, 30% and 40% } Simple scoring (match: +1, mismatch: -1, gap: -1) } At D=120, all observed alignments were optimal D (in bp) 10 Fraction alignments nonoptimal Worst-case score loss 10% 20% 30% 40% 10% 20% 30% 40% 0 30.4% 61.0% 83.0% 94.7% 0.29% 0.67% 1.26% 2.38% 30 0.0% 0.02% 0.55% 55.3% 0.0% 0.35% 0.63% 1.59% 60 0.0% 0.0% 0.01% 1.38% 0.0% 0.0% 0.34% 0.81% 90 0.0% 0.0% 0.0% 0.05% 0.0% 0.0% 0.0% 0.33% 120 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

GACT Hardware-acceleration Reference A C T A A G G T C G G T A T = 9 PE 0 PE 1 PE 2 PE 3 G C T G A G T Query Block 1 SRAM SRAM SRAM SRAM Query C A C T Query Block 2 A TB Logic T Query Block 3 } Systolic array of N pe (= 4) processing elements (PEs) solve Smith-Waterman-Gotoh } Tile with size T > N pe, query divided into blocks, reference streamed through each block } Computation exploits wave-front parallelism } On-chip SRAM for storing trace-back state (4-bit per cell) } Total SRAM size = 4-bit x (T max ) 2 => 128KB for T max = 512 11

Darwin: GACT Performance 1000000 574K GACT (Software) Edlib GACT (Darwin) 100000 302X 108K 54K Alignments/sec 10000 1000 35X 100 591X 19X 986X 11X 10 1 1 2 3 4 5 6 7 8 9 10 Sequence length (Kbp) Runtime scales linearly to sequence length 300-1000X faster than Edlib 10,000X faster than software implementation of GACT 12

13 D-SOFT algorithm and hardware design

Seed Position table based exact matching R = AGCTATACTA Seed Positions AA AC 6 AG 0 AT 4 CA CC CG CT 2 7 GA GC 1 GG GT Q = GCTA Q 3 2 1 0 GC:1 CT: 2, 7 TA: 3, 5, 8 Slope=1 1 2 3 4 5 6 7 8 R TA 3 5 8 TC TG For human genome, seed position table size > 12GB (4B x 3 x 10 9 ) TT 14

Diagonal-band Seed Overlapping based Filtration Technique (D-SOFT) Query (Q) 10 9 8 7 6 5 4 3 2 1 6 5 9 4 0 5 Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Reference (R) N B = 6 N = 10 k = 4 h = 7 } Divide R into N B bins (diagonal bands) } Use N seeds of size k bp from different offsets in Q } Lookup positions of seeds in R and assign each seed hit to corresponding bin (diagonal band) } Count non-overlapping Q base-pairs covered by seed hits for each bin and filter based on threshold h (same as DALIGNER) 15

D-SOFT hardware-acceleration design Area: 264 mm 2 Power: 7.3W Random accesses to update bins using on-chip SRAM (bin count SRAM) Area and power both dominated by 64MB Bin count SRAM Hardware exploits DRAM channel parallelism for seed position lookup 16

D-SOFT hardware-acceleration throughput k Avg. hits per seed (Human Genome) Throughput (10 3 seeds/sec) Software Darwin Darwin speedup 11 1765 7.9 760.6 96.3X 12 457 29.1 2,796.2 96.1X 13 118 136.1 9,126.3 67.1X 14 32 339.0 21,271.1 62.7X 15 8 784.3 34,166.7 43.5X } ~2X speedup from parallel DRAM channels } ~3X reduction in number of memory accesses to the DRAM } All random memory accesses to update bins using on-chip SRAM (64MB) } On-chip updates completely hide off-chip (DRAM) bandwidth 17

18 Long read assembly on Darwin

Darwin: Read assembly Reference-guided De novo 19

Darwin: Performance Results Reference-guided (54X human genome) Read Error Rate D-SOFT settings (k, N, h) Baseline Sensitivity Darwin Speedup 15% (14, 750, 24) 95.95% 99.91% 4,110X 30% (12, 1000, 25) 98.11% 98.40% 4,088X 40% (11, 1300, 22) 97.10% 97.40% 128X Baseline: BWA-MEM (15%), GraphMap (30%, 40%) De novo (54X human genome) Read Error Rate D-SOFT settings (k, N, h) Baseline Sensitivity Darwin Speedup (Bottleneck) 15% (14, 1300, 24) 99.80% 99.89% 264X Baseline: DALIGNER 20

Thank you! Questions or feedback? 21