Accelerating Genomic Computations 1000X with Hardware
|
|
- George Johns
- 6 years ago
- Views:
Transcription
1 Accelerating Genomic Computations 1000X with Hardware Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering and Computer Science) Prof. Gill Bejerano (Computer Science, Developmental Biology and Pediatrics)
2 DNA sequencing costs and data explosion 1 st gen Since 2003, genomics data doubling every 7 months! Exabyte data by M to 2B genomes to be sequenced! Stephens, Zachary D., et al. "Big data: astronomical or genomical?." PLoS Biology (2015) 2nd gen 3rd gen Storing and processing genome data will exceed the computing challenges of running YouTube and Twitter, biologists warn. [Nature News, 2015] The decreasing cost of sequencing and the increasing number of sequence reads being generated are placing greater demand on the computational resources and knowledge necessary to handle sequence data. [Genome Biology, 2016] 2
3 Genomic Granular Computing Applications Neonatal ICU 4 million newborns per year in the US alone 1 in 33 newborns with rare genetic conditions admitted to NICU Time of essence for genome-based diagnosis Non-invasively diagnose for over 3,000 rare genetic conditions (e.g. Down Syndrome) Free-floating DNA in blood enormous volume! Prenatal ICU and IVF clinics 3 Liquid Biopsy Early cancer detection life-saving application for millions of individuals Non-invasive circulating tumor DNA Periodic sequencing of healthy individuals - enormous volume!
4 Patient Diagnosis: Sample-to-answer Patient Reads 1 2 ATGTCGAT CGATACGA GAGTCATC ACTGACGT Read assembly Genome (3 Billion base pairs) REFERENCE:--ATGTCGATGATCCAGAGGATACTAGGATAT- PATIENT: --ATGTCTATGATC--GAGGATATTAGGATAT- Mutations 3 Genome Sequencing Machine Find the causal mutation Long reads (>10Kbp) offer a better resolution of the mutation spectrum but have high error rate (15-40%) >1,300 CPU hours for reference-guided assembly of noisy long reads 14.2M CPU-years for 100M individuals >15,600 CPU hours for de novo assembly of noisy long reads 178M CPU-years for 100M individuals 4
5 Darwin: A Genomics Co-processor Query (Q) D-SOFT Reference (R) D-SOFT (filter) D-SOFT API Darwin GACT (aligner) GACT API Query (Q) GACT Software Aligner Reference (R) High speed and programmability 1. D-SOFT: Tunable speed/precision to match any error profile 2. GACT: First algorithm with O(1) memory for computeintensive step of alignment allowing arbitrarily long alignments in hardware ideal for long reads 3. First framework shown to accelerate reference-guided as well as de novo assembly of reads in hardware 5
6 Darwin: 40nm ASIC configuration LPDDR4 (32GB) LPDDR4 (32GB) Software D-SOFT API GACT API Darwin D-SOFT GACT GACT GACT GACT GACT GACT GACT GACT Software (Intel Xeon E5) Algorithm Power (1 thread) BWA-MEM 9.2W GraphMap 10.7W DALIGNER 8.8W Area: 300mm 2 Power: 9W 6
7 7 GACT algorithm and hardware design
8 Strategies for long sequence alignment Algorithm Time Space (compute-intensive step) Optimal Smith-Waterman O(mn) O(mn) Y Hirschberg O(mn) O(m+n) Y Banded Smith- Waterman O(n) O(n) N X-drop O(n) O(n) N GACT O(n) O(1) N m, n: sequence lengths m >= n Profound hardware design implications Prior assumptions (hardware) Small upper bound on sequence length n OR Trace-back of alignment in software SLOW! 8
9 Genome Alignment using Constant-memory Trace-back (GACT) 1. Initialize I curr, J curr in R, Q 2. Form tile of maximum size T around I curr, J curr in R, Q 3. Align tile with trace-back from I curr, J curr with at most (T-D) steps 4. Update I curr, J curr with traceback end coordinates 5. Repeat 2-4 till extension no longer possible Query (Q) * G G T C G T T T Reference (R) * G G C G A C T T T Tile 1 Tile 3 T = 5, D=2 Tile 2 (I curr, J curr ) (I curr, J curr ) Optimal Alignment G G - C G A C T T T G G T C G - - T T T Score = 11 Alignment G G - C G A C T T T G G T C G - - T T T Score = 11 9
10 GACT empirically provides optimal alignments } GACT tile size T=400 } GACT compared to optimal Smith-Waterman for 200,000 10Kbp sequences with 4 different error rates: 10%, 20%, 30% and 40% } Simple scoring (match: +1, mismatch: -1, gap: -1) } At D=120, all observed alignments were optimal D (in bp) 10 Fraction alignments nonoptimal Worst-case score loss 10% 20% 30% 40% 10% 20% 30% 40% % 61.0% 83.0% 94.7% 0.29% 0.67% 1.26% 2.38% % 0.02% 0.55% 55.3% 0.0% 0.35% 0.63% 1.59% % 0.0% 0.01% 1.38% 0.0% 0.0% 0.34% 0.81% % 0.0% 0.0% 0.05% 0.0% 0.0% 0.0% 0.33% % 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
11 GACT Hardware-acceleration Reference A C T A A G G T C G G T A T = 9 PE 0 PE 1 PE 2 PE 3 G C T G A G T Query Block 1 SRAM SRAM SRAM SRAM Query C A C T Query Block 2 A TB Logic T Query Block 3 } Systolic array of N pe (= 4) processing elements (PEs) solve Smith-Waterman-Gotoh } Tile with size T > N pe, query divided into blocks, reference streamed through each block } Computation exploits wave-front parallelism } On-chip SRAM for storing trace-back state (4-bit per cell) } Total SRAM size = 4-bit x (T max ) 2 => 128KB for T max =
12 Darwin: GACT Performance K GACT (Software) Edlib GACT (Darwin) X 108K 54K Alignments/sec X X 19X 986X 11X Sequence length (Kbp) Runtime scales linearly to sequence length X faster than Edlib 10,000X faster than software implementation of GACT 12
13 13 D-SOFT algorithm and hardware design
14 Seed Position table based exact matching R = AGCTATACTA Seed Positions AA AC 6 AG 0 AT 4 CA CC CG CT 2 7 GA GC 1 GG GT Q = GCTA Q GC:1 CT: 2, 7 TA: 3, 5, 8 Slope= R TA TC TG For human genome, seed position table size > 12GB (4B x 3 x 10 9 ) TT 14
15 Diagonal-band Seed Overlapping based Filtration Technique (D-SOFT) Query (Q) Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Reference (R) N B = 6 N = 10 k = 4 h = 7 } Divide R into N B bins (diagonal bands) } Use N seeds of size k bp from different offsets in Q } Lookup positions of seeds in R and assign each seed hit to corresponding bin (diagonal band) } Count non-overlapping Q base-pairs covered by seed hits for each bin and filter based on threshold h (same as DALIGNER) 15
16 D-SOFT hardware-acceleration design Area: 264 mm 2 Power: 7.3W Random accesses to update bins using on-chip SRAM (bin count SRAM) Area and power both dominated by 64MB Bin count SRAM Hardware exploits DRAM channel parallelism for seed position lookup 16
17 D-SOFT hardware-acceleration throughput k Avg. hits per seed (Human Genome) Throughput (10 3 seeds/sec) Software Darwin Darwin speedup X , X , X , X , X } ~2X speedup from parallel DRAM channels } ~3X reduction in number of memory accesses to the DRAM } All random memory accesses to update bins using on-chip SRAM (64MB) } On-chip updates completely hide off-chip (DRAM) bandwidth 17
18 18 Long read assembly on Darwin
19 Darwin: Read assembly Reference-guided De novo 19
20 Darwin: Performance Results Reference-guided (54X human genome) Read Error Rate D-SOFT settings (k, N, h) Baseline Sensitivity Darwin Speedup 15% (14, 750, 24) 95.95% 99.91% 4,110X 30% (12, 1000, 25) 98.11% 98.40% 4,088X 40% (11, 1300, 22) 97.10% 97.40% 128X Baseline: BWA-MEM (15%), GraphMap (30%, 40%) De novo (54X human genome) Read Error Rate D-SOFT settings (k, N, h) Baseline Sensitivity Darwin Speedup (Bottleneck) 15% (14, 1300, 24) 99.80% 99.89% 264X Baseline: DALIGNER 20
21 Thank you! Questions or feedback? 21
SWAMP: Smith-Waterman using Associative Massive Parallelism
SWAMP: Smith-Waterman using Associative Massive Parallelism Shannon I. Steinfadt and Johnnie Baker 9th International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 08)
More informationFigure S4 A-H : Initiation site properties and evolutionary changes
A 0.3 Figure S4 A-H : Initiation site properties and evolutionary changes G-correction not used 0.25 Fraction of total counts 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags
More informationSYMPOSIUM March 22-23, 2018
Bigger and Better Data Lessons from Frontlines of Precision Medicine Getting Your Transformation Right Frank Lee PhD IBM Global Industry Leader for Systems Group SYMPOSIUM March 22-23, 2018 5th Annual
More informationOutline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture
The use of new sequencing technologies for genome analysis Chris Mattocks National Genetics Reference Laboratory (Wessex) NGRL (Wessex) 2008 Outline General principles of clonal sequencing Analysis principles
More informationDatabase Searching and BLAST Dannie Durand
Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is
More informationTranscription factor binding site prediction in vivo using DNA sequence and shape features
Transcription factor binding site prediction in vivo using DNA sequence and shape features Anthony Mathelier, Lin Yang, Tsu-Pei Chiu, Remo Rohs, and Wyeth Wasserman anthony.mathelier@gmail.com @AMathelier
More informationIllumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme
Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in
More informationVariation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI
Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality
More informationAccelerate High Throughput Analysis for Genome Sequencing with GPU
Accelerate High Throughput Analysis for Genome Sequencing with GPU ATIP - A*CRC Workshop on Accelerator Technologies in High Performance Computing May 7-10, 2012 Singapore BingQiang WANG, Head of Scalable
More informationHiSeqTM 2000 Sequencing System
IET International Equipment Trading Ltd. www.ietltd.com Proudly serving laboratories worldwide since 1979 CALL +847.913.0777 for Refurbished & Certified Lab Equipment HiSeqTM 2000 Sequencing System Performance
More informationAccelerating Motif Finding in DNA Sequences with Multicore CPUs
Accelerating Motif Finding in DNA Sequences with Multicore CPUs Pramitha Perera and Roshan Ragel, Member, IEEE Abstract Motif discovery in DNA sequences is a challenging task in molecular biology. In computational
More informationCreation of a PAM matrix
Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationRapid Parallel Genome Indexing using MapReduce
Rapid Parallel Genome Indexing using MapReduce Rohith Menon, Goutham Bhat & Michael Schatz* June 8, 2011 HPDC 11/MapReduce Outline 1. Brief Overview of DNA Sequencing 2. Genome Indexing Serial, Basic MR,
More informationAddressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC
Addressing the I/O bottleneck of HPC workloads Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC I/O is key Exascale challenge Parallelism beyond 100 million threads demands a new approach
More informationSizing SAP Central Process Scheduling 8.0 by Redwood
Sizing SAP Central Process Scheduling 8.0 by Redwood Released for SAP Customers and Partners January 2012 Copyright 2012 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted
More informationUsing FPGAs to Accelerate Neural Network Inference
Using FPGAs to Accelerate Neural Network Inference 1 st FPL Workshop on Reconfigurable Computing for Deep Learning (RC4DL) 8. September 2017, Ghent, Belgium Associate Professor Magnus Jahre Department
More informationOracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2
Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2 O R A C L E W H I T E P A P E R J A N U A R Y 2 0 1 5 Table of Contents Disclaimer
More informationGenomic Data Is Going Google. Ask Bigger Biological Questions
Genomic Data Is Going Google Ask Bigger Biological Questions You know your research could have a significant scientific impact and answer questions that may redefine how a disease is diagnosed or treated.
More informationAlignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics
Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform
More informationChapter 10: Gene Expression and Regulation
Chapter 10: Gene Expression and Regulation Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are the workhorses but contain no information THUS Information in DNA must
More informationRead Mapping and Variant Calling. Johannes Starlinger
Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.
More informationWhat about streaming data?
What about streaming data? 1 The Stream Model Data enters at a rapid rate from one or more input ports Such data are called stream tuples The system cannot store the entire (infinite) stream Distribution
More informationHigh-yield, Scalable Library Preparation with the NEBNext Ultra II FS DNA Library Prep Kit
be INSPIRED drive DISCOVERY stay GENUINE TECHNICAL NOTE High-yield, Scalable Library Preparation with the NEBNext Ultra II FS DNA Library Prep Kit Improving performance, ease of use and reliability of
More informationThe Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples
The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples Andreas Scherer, Ph.D. President and CEO Dr. Donald Freed, Bioinformatics Scientist, Sentieon
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationPlasmodium vivax. (Guerra, 2006) (Winzeler, 2008)
Plasmodium vivax Major cause of malaria outside Africa 25 40% of clinical cases worldwide Not amenable to in vitro culture Interesting biology Hypnozoites: dormant liver stage responsible for relapses
More informationCourse Overview: Mutation Detection Using Massively Parallel Sequencing
Course Overview: Mutation Detection Using Massively Parallel Sequencing From Data Generation to Variant Annotation Eliot Shearer The Iowa Initiative in Human Genetics Bioinformatics Short Course 2012 August
More informationHPC Analytics in the Era of Big Data J. Robert Michael, PhD Sr. Software Engineer St. Jude Children s Research Hospital
HPC Analytics in the Era of Big Data J. Robert Michael, PhD Sr. Software Engineer St. Jude Children s Research Hospital Outline HPC at St Jude what are we doing? Automated image analysis what are the issues?
More informationThe Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data
The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data Donald Freed 1*, Rafael Aldana 1, Jessica A. Weber 2, Jeremy S. Edwards 3,4,5 1 Sentieon Inc,
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationOptimize the Performance of Your Cloud Infrastructure
Optimize the Performance of Your Cloud Infrastructure AppFormix software leverages cutting-edge Intel Resource Director Technology (RDT) hardware features to improve cloud infrastructure monitoring and
More informationMachine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University
Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics
More informationBLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationExploring the Genetic Basis of Congenital Heart Defects
Exploring the Genetic Basis of Congenital Heart Defects Sanjay Siddhanti Jordan Hannel Vineeth Gangaram szsiddh@stanford.edu jfhannel@stanford.edu vineethg@stanford.edu 1 Introduction The Human Genome
More informationThe Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience
Building Excellence in Genomics and Computational Bioscience Wheat genome sequencing: an update from TGAC Sequencing Technology Development now Plant & Microbial Genomics Group Leader Matthew Clark matt.clark@tgac.ac.uk
More informationDesigning High Thermal Conductive Materials Using Artificial Evolution MICHAEL DAVIES, BASKAR GANAPATHYSUBRAMANIAN, GANESH BALASUBRAMANIAN
Designing High Thermal Conductive Materials Using Artificial Evolution MICHAEL DAVIES, BASKAR GANAPATHYSUBRAMANIAN, GANESH BALASUBRAMANIAN The Problem Graphene is one of the most thermally conductive materials
More informationSupplementary Figure 1
number of cells, normalized number of cells, normalized number of cells, normalized Supplementary Figure CD CD53 Cd3e fluorescence intensity fluorescence intensity fluorescence intensity Supplementary
More informationMike Strickland, Director, Data Center Solution Architect Intel Programmable Solutions Group July 2017
Mike Strickland, Director, Data Center Solution Architect Intel Programmable Solutions Group July 2017 Accelerate Big Data Analytics with Intel Frameworks and Libraries with FPGA s 1. Intel Big Data Analytics
More informationHuman Genomics, Precision Medicine, and Advancing Human Health. The Human Genome. The Origin of Genomics : 1987
Human Genomics, Precision Medicine, and Advancing Human Health Eric Green, M.D., Ph.D. Director, NHGRI The Human Genome Cells Nucleus Chromosome DNA Human Genome: 3 Billion Bases (letters) The Origin of
More informationJack Weast. Principal Engineer, Chief Systems Engineer. Automated Driving Group, Intel
Jack Weast Principal Engineer, Chief Systems Engineer Automated Driving Group, Intel From the Intel Newsroom 2 Levels of Automated Driving Courtesy SAE International Ref: J3061 3 Simplified End-to-End
More informationMultiplex Assay Design
Multiplex Assay Design Geeta Bhat, Luminex Molecular Diagnostics; Toronto. APHL/CDC Newborn Screening Molecular Workshop, CDC, Atlanta, GA June 28-30, 2011 Luminex Multiplexed Solutions. For Life. Luminex
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationApplication for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick
Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationTheory and Application of Multiple Sequence Alignments
Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)
More informationWhat is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.
What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer
More informationIntroductory Next Gen Workshop
Introductory Next Gen Workshop http://www.illumina.ucr.edu/ http://www.genomics.ucr.edu/ Workshop Objectives Workshop aimed at those who are new to Illumina sequencing and will provide: - a basic overview
More informationON USING DNA DISTANCES AND CONSENSUS IN REPEATS DETECTION
ON USING DNA DISTANCES AND CONSENSUS IN REPEATS DETECTION Petre G. POP Technical University of Cluj-Napoca, Romania petre.pop@com.utcluj.ro Abstract: Sequence repeats are the simplest form of regularity
More informationTarget Enrichment Strategies for Next Generation Sequencing
Target Enrichment Strategies for Next Generation Sequencing Anuj Gupta, PhD Agilent Technologies, New Delhi Genotypic Conference, Sept 2014 NGS Timeline Information burst Nearly 30,000 human genomes sequenced
More informationCS 68: BIOINFORMATICS. Prof. Sara Mathieson Swarthmore College Spring 2018
CS 68: BIOINFORMATICS Prof. Sara Mathieson Swarthmore College Spring 2018 Outline: Jan 24 Central dogma of molecular biology Sequencing pipeline Begin: genome assembly Note: office hours Monday 3-5pm and
More informationTargeted Sequencing in the NBS Laboratory
Targeted Sequencing in the NBS Laboratory Christopher Greene, PhD Newborn Screening and Molecular Biology Branch Division of Laboratory Sciences Gene Sequencing in Public Health Newborn Screening February
More informationWelcome to the NGS webinar series
Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic
More informationIncreasing Enterprise Support Demand & Complexity
PTC System Monitor Increasing Enterprise Support Demand & Complexity Diagnostics & Troubleshooting Tools based on Customer & TS Requirements Customer Challenges Visibility into System Health Time To Resolution
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationThe More the Merrier: Efficient Multi-Source Graph Traversal
The More the Merrier: Efficient Multi-Source Graph Traversal Manuel Then *, Moritz Kaufmann *, Fernando Chirigati, Tuan-Anh Hoang-Vu, Kien Pham, Huy T. Vo, Alfons Kemper *, Thomas Neumann * * Technische
More informationUltrasequencing: Methods and Applications of the New Generation Sequencing Platforms
Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms Laura Moya Andérico Master in Advanced Genetics Genomics Class December 16 th, 2015 Brief Overview First-generation
More informationReview of whole genome methods
Review of whole genome methods Suffix-tree based MUMmer, Mauve, multi-mauve Gene based Mercator, multiple orthology approaches Dot plot/clustering based MUMmer 2.0, Pipmaker, LASTZ 10/3/17 0 Rationale:
More informationACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes
ACCELERATING GENOMIC ANALYSIS ON THE CLOUD Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia
More informationSmarter Analytics for Big Data
Smarter Analytics for Big Data Anjul Bhambhri IBM Vice President, Big Data February 27, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT The resulting explosion of information
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationMassively parallel assemblers for massively parallel DNA sequencers
Massively parallel assemblers for massively parallel DNA sequencers Length: 1 hour Sébastien Boisvert Ph.D. student, Laval University CIHR doctoral scholar Élénie Godzaridis Strategic Technology Projects
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationAssembly of Ariolimax dolichophallus using SOAPdenovo2
Assembly of Ariolimax dolichophallus using SOAPdenovo2 Charles Markello, Thomas Matthew, and Nedda Saremi Image taken from Banana Slug Genome Project, S. Weber SOAPdenovo Assembly Tool Short Oligonucleotide
More informationMapping strategies for sequence reads
Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements
More informationWhole genome sequencing in drug discovery research: a one fits all solution?
Whole genome sequencing in drug discovery research: a one fits all solution? Marc Sultan, September 24th, 2015 Biomarker Development, Translational Medicine, Novartis On behalf of the BMD WGS pilot team:
More informationGPU Accelerated Molecular Docking Simulation with Genetic Algorithms
GPU Accelerated Molecular Docking Simulation with Genetic Algorithms Serkan Altuntaş, Zeki Bozkus and Basilio B. Fraguel 1 Department of Computer Engineering, Kadir Has Üniversitesi, Turkey, serkan.altuntas@stu.khas.edu.tr,
More informationWindows Server Capacity Management 101
Windows Server Capacity Management 101 What is Capacity Management? ITIL definition of Capacity Management is: Capacity Management is responsible for ensuring that adequate capacity is available at all
More informationMachine Learning. Genetic Algorithms
Machine Learning Genetic Algorithms Genetic Algorithms Developed: USA in the 1970 s Early names: J. Holland, K. DeJong, D. Goldberg Typically applied to: discrete parameter optimization Attributed features:
More informationMachine Learning. Genetic Algorithms
Machine Learning Genetic Algorithms Genetic Algorithms Developed: USA in the 1970 s Early names: J. Holland, K. DeJong, D. Goldberg Typically applied to: discrete parameter optimization Attributed features:
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a
More informationE2ES to Accelerate Next-Generation Genome Analysis in Clinical Research
www.hcltech.com E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research whitepaper April 2015 TABLE OF CONTENTS Introduction 3 Challenges associated with NGS data analysis 3 HCL s NGS Solution
More informationNext Generation Sequencing. Target Enrichment
Next Generation Sequencing Target Enrichment Next Generation Sequencing Your Partner in Every Step from Sample to Data NGS: Revolutionizing Genetic Analysis with Single-Molecule Resolution Next generation
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Components for practical performance engineering in a computing center environment: The ProPE project Jan Eitzinger Workshop on Performance Engineering for HPC: Implementation,
More informationHaplotype phasing in large cohorts: Modeling, search, or both?
Haplotype phasing in large cohorts: Modeling, search, or both? Po-Ru Loh Harvard T.H. Chan School of Public Health Department of Epidemiology Broad MIA Seminar, 3/9/16 Overview Background: Haplotype phasing
More informationAlgorithms for Bioinformatics
Algorithms for Bioinformatics Compressive Genomics Ulf Leser Content of this Lecture Next Generation Sequencing Sequence compression Approximate search in compressed genomes Using multiple references This
More informationUniversity of California at Berkeley College of Engineering Computer Science Division - EECS. Computer Architecture and Engineering Midterm II
University of California at Berkeley College of Engineering Computer Science Division - EECS CS 152 Fall 1995 D. Patterson & R. Yung Computer Architecture and Engineering Midterm II Your Name: SID Number:
More informationCS294: RISE Logistics, Overview, Trends
CS294: RISE Logistics, Overview, Trends Joey Gonzalez, Joe Hellerstein, Raluca Popa, Ion Stoica August 29, 2016 2 Goal of this Class Bootstrap RISE research agenda Start new projects or work on existing
More informationSHENGYUAN LIU, JUNGANG XU, ZONGZHENG LIU, XU LIU & RICE UNIVERSITY
EVALUATING TASK SCHEDULING IN HADOOP-BASED CLOUD SYSTEMS SHENGYUAN LIU, JUNGANG XU, ZONGZHENG LIU, XU LIU UNIVERSITY OF CHINESE ACADEMY OF SCIENCES & RICE UNIVERSITY 2013-9-30 OUTLINE Background & Motivation
More informationUSING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS
USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS Claude SCARPELLI Claude.Scarpelli@cea.fr FUNDAMENTAL RESEARCH DIVISION GENOMIC INSTITUTE Intel DDN Life Science Field Day Heidelberg,
More informationECLIPSE 2012 Performance Benchmark and Profiling. August 2012
ECLIPSE 2012 Performance Benchmark and Profiling August 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationBig Data and Real Time Analytics Streams and Hadoop
Big Data and Real Time Analytics Streams and Hadoop Infrastructure Matters 2014 Briefing 2014 IBM Corporation Big Data is more than just Hadoop What can you tell me about Big Data? I want to know all about
More information2015 IBM Corporation
2015 IBM Corporation Marco Garibaldi IBM Pre-Sales Technical Support Prestazioni estreme, accelerazione applicativa,velocità ed efficienza per generare valore dai dati 2015 IBM Corporation Trend nelle
More informationApplications of Big Data in Evidence-Based Medicine
Applications of Big Data in Evidence-Based Medicine Carolyn Compton, MD, PhD Professor Life Sciences, Arizona State University Professor Laboratory Medicine and Pathology, Mayo Clinic Adjunct Professor
More informationOptimoDE: Programmable Accelerator Engines Through Retargetable Customization
OptimoDE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott Mahlke CCCP Research Group University of Michigan http://cccp.eecs.umich.edu
More information1 st JILP Workshop on Computer Architecture Competitions (JWAC-1):
The Journal of Instruction-Level Parallelism 1 st JILP Workshop on Computer Architecture Competitions (JWAC-1): Cache Replacement Championship Held In Conjunction with ISCA 2010 Saint Malo, France Forward
More informationBias Scheduling in Heterogeneous Multicore Architectures. David Koufaty Dheeraj Reddy Scott Hahn
Bias Scheduling in Heterogeneous Multicore Architectures David Koufaty Dheeraj Reddy Scott Hahn Motivation Mainstream multicore processors consist of identical cores Complexity dictated by product goals,
More informationWhy can GBS be complicated? Tools for filtering, error correction and imputation.
Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower
More informationRODOD Performance Test on Exalogic and Exadata Engineered Systems
An Oracle White Paper March 2014 RODOD Performance Test on Exalogic and Exadata Engineered Systems Introduction Oracle Communications Rapid Offer Design and Order Delivery (RODOD) is an innovative, fully
More informationIBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation
Versatile, scalable workload management IBM xseries 430 With Intel technology at its core and support for multiple applications across multiple operating systems, the xseries 430 enables customers to run
More informationMapping by recurrence and modelling the mutation rate
Current knowledge is from apping by recurrence and modelling the mutation rate Shamil Sunyaev Broad Institute of.i.t. and Harvard Comparative genomics Experimental systems: yeast reporter assays Potential
More informationNature Genetics: doi: /ng Supplementary Figure 1
Supplementary Figure 1 Processing of mutations and generation of simulated controls. On the left, a diagram illustrates the manner in which covariate-matched simulated mutations were obtained, filtered
More informationLecture 6 Software Quality Measurements
Lecture 6 Software Quality Measurements Some materials are based on Fenton s book Copyright Yijun Yu, 2005 Last lecture and tutorial Software Refactoring We showed the use of refactoring techniques on
More information100 Million Subscriber Performance Test Whitepaper:
An Oracle White Paper April 2011 100 Million Subscriber Performance Test Whitepaper: Oracle Communications Billing and Revenue Management 7.4 and Oracle Exadata Database Machine X2-8 Oracle Communications
More informationSequence Analysis Lab Protocol
Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136
More informationAlignment to a database. November 3, 2016
Alignment to a database November 3, 2016 How do you create a database? 1982 GenBank (at LANL, 2000 sequences) 1988 A way to search GenBank (FASTA) Genome Project 1982 GenBank (at LANL, 2000 sequences)
More information