HRG Insight: IBM & Intel: Intelligent Choice for Life Sciences

Similar documents
IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation

White Paper. IBM System x. Superiority of the IBM ex5 Systems for Virtualized SAP Environments

DELL EMC POWEREDGE 14G SERVER PORTFOLIO

Research Report. The Major Difference Between IBM s LinuxONE and x86 Linux Servers

ECLIPSE 2012 Performance Benchmark and Profiling. August 2012

ANSYS FLUENT Performance Benchmark and Profiling. October 2009

demands of lab professionals in applications such as A powerful history: SEQUEST Cluster personal computer. These search times

Hadoop Solutions. Increase insights and agility with an Intel -based Dell big data Hadoop solution

Product Brief SysTrack VMP

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure

Beyond five nines availability: Achieving high availabilty with Dell Compellent storage center

Lenovo ThinkSystem Solution for SAP Business Suite Applications

In-Memory Analytics: Get Faster, Better Insights from Big Data

Building smart products: best practices for multicore software development

2015 IBM Corporation

FlashStack For Oracle RAC

Ensure Your Servers Can Support All the Benefits of Virtualization and Private Cloud The State of Server Virtualization... 8

ORACLE BIG DATA APPLIANCE

IBM ICE (Innovation Centre for Education) Welcome to: Unit 1 Overview of delivery models in Cloud Computing. Copyright IBM Corporation

HiSeqTM 2000 Sequencing System

Better Performance Management. Get serious with SAP and take our Go Live in a Day Challenge with SAP BPC

Windows Server Capacity Management 101

COMPANY PROFILE.

ANSYS, Inc. March 12, ANSYS HPC Licensing Options - Release

VCE VBLOCK SYSTEMS. The Leading Converged Infrastructure. Copyright 2013 EMC Corporation. All rights reserved.

Dell EMC Ready Solutions for Business Applications

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

WHEN PERFORMANCE MATTERS: EXTENDING THE COMFORT ZONE: DAVIDE. Fabrizio Magugliani Siena, May 16 th, 2017

Advancing the adoption and implementation of HPC solutions. David Detweiler

Powering Disruptive Technologies with High-Performance Computing

Why more and more SAP customers are migrating to Solaris

IBM Tivoli Monitoring

Innovative solutions to simplify your business. IBM System i5 Family

IBM Systems for Oracle Fusion Middleware

BullSequana S series. Powering Enterprise Artificial Intelligence

Integrated Service Management

Jack Weast. Principal Engineer, Chief Systems Engineer. Automated Driving Group, Intel

Addressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC

Genomic Data Is Going Google. Ask Bigger Biological Questions

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop

Quantifying the Value of Investing In VM Explorer

Accelerate precision medicine with Microsoft Genomics

RODOD Performance Test on Exalogic and Exadata Engineered Systems

IBM i Reduce complexity and enhance productivity with the world s first POWER5-based server. Highlights

Managing Data Warehouse Growth in the New Era of Big Data

IBM Systems Unleashing the power of a software defined infrastructure

Increased Informix Awareness Discover Informix microsite launched

vsphere with Operations Management and vcenter Operations VMware vforum, 2014 Mehmet Çolakoğlu 2014 VMware Inc. All rights reserved.

ACCELERATING THE FOREFRONT OF MEMBRANE TRANSPORT RESEARCH FOR DRUG DESIGN

Adobe Deploys Hadoop as a Service on VMware vsphere

CONVERGED INFRASTRUCTRE POUR HANA ET SAP

HP Cloud Maps for rapid provisioning of infrastructure and applications

Cognitive, AI and Analytics

IBM zenterprise Opens New Horizons for SAP Customers

NFV Orchestrator powered by VMware

Knauf builds high-speed business insight with SAP and IBM

e-business on demand

Introducing MasterCard SmartLink *

Advanced Support for Server Infrastructure Refresh

Introducing the World s Best PC Fleet Power Management System

Bringing the Power of SAS to Hadoop Title

The IBM and Oracle alliance. Power architecture

IBM Virtual Appliance for Oracle Database

Trasformare il Business con Soluzioni Cloud

IBM z13 Technical Innovation

Scalability and High Performance with MicroStrategy 10

IBM and ESRI hybrid solutions on the new IBM PureFlex System

Oracle Enterprise Manager 13c Cloud Control

EMC VNX FAMILY. Next-generation unified storage, optimized for virtualized applications ESSENTIALS. VNX Family

The world leader in serving science. DataSafe Solutions. Protect your valuable laboratory data

Lenovo Services for the Data Center

Bioinformatics and computational tools

CHAPTER 3 ENTERPRISE SYSTEMS ARCHITECTURE

BMC - Business Service Management Platform

CSC 121 Computers and Scientific Thinking

ORACLE COMMUNICATIONS BILLING AND REVENUE MANAGEMENT 7.5

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

COMPARE VMWARE. Business Continuity and Security. vsphere with Operations Management Enterprise Plus. vsphere Enterprise Plus Edition

Securely Access Data. Reduce Costs. Focus on Care, not IT. NextGen Managed Cloud Services

Oracle Big Data Cloud Service

Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2

E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research

Challenges in Application Scaling In an Exascale Environment

StorageTek Virtual Storage Manager System 7

WorkloadWisdom Storage performance analytics for comprehensive workload insight

Accelerating Genomic Computations 1000X with Hardware

10 Reasons LinuxONE is the Best Choice for Linux Workloads

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Veritas 360 Data Management Suite. Accelerating Digital Transformation Through Multi-Cloud Data Management.

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

MapR Pentaho Business Solutions

White Paper. Veritas Configuration Manager by Symantec. Removing the Risks of Change Management and Impact to Application Availability

PURSUING THE AGILE ENTERPRISE:

Evergreen Storage Is Changing Customer Experience Expectations in Enterprise Storage

IBM Spectrum Scale. Transparent Cloud Tiering Deep Dive

Accelerating Motif Finding in DNA Sequences with Multicore CPUs

Design Virtualization and Its Impact on SoC Design

agalaxy FOR THUNDER ADC CENTRALIZED CONFIGURATION, MANAGEMENT & MONITORING PLATFORM

HPC Solution for the Drug Repositioning Problem

Cognitive Solutions in the Context of IBM Systems Cognitive Analytics / Integration Scenarios / Use Cases

Transcription:

HRG Insight: IBM & Intel: Intelligent Choice for Life Sciences IBM ex5 server technology significantly advances the state of Life Sciences research applications such as Bioinformatics, Genomic Research, and Translational Medicine. Workloads in Life Sciences, such as genomic sequencing, assembly, alignment, and secondary data analysis, gain significant benefits from the available memory capacity of IBM ex5 in combination with the improvements in transaction throughput enabled by the new Intel Xeon processor 7500 series. IBM customers will realize significant competitive advantage in faster ROI, reduced TCO, and improved time to result. While today s institutions and organizations conducting the full genome sequence experiments do provide primary analysis, they may not do the secondary analysis. It is this focused secondary analysis that looks for the very small percent of DNA variants which is the basis for research discoveries and insights. This is a precursor for the development and delivery of clinical applications to identify, diagnose, prevent and cure disease. This area of secondary data analysis is emerging as the area where right sized compute and data intensive solutions like IBM's ex5 powered by Intel Xeon processor 7500 series start to provide the computational throughput and very large memory capacity needed to make genomic data useful to Life Sciences companies, physicians, and patients. Copyright 2010 Harvard Research Group, Inc.

Selected HPC Life Science Disciplines & Workloads The Life Sciences industry segment can be characterized by an exponential growth in the volumes of raw and processed data in addition to an insatiable demand for compute power and throughput. Today High Performance Computing (HPC) techniques and technologies in life Sciences are being applied to workloads and solutions of large scale problems. The term workload is used here to describe the work being done, the relevant data characteristics, and the software used to manipulate the data. The following table provides a perspective for the discussion of HPC workload requirements for Life Sciences and in particular for genomic sequencing. Discipline Solutions Data/Application Characteristics Major Applications Bioinformatics Sequence Analysis Bioinformatics Sequence Assembly Biochemistry Drug Discovery Computational Chemistry Molecular Modeling & Quantum Mechanics Proteomics Searching, alignment & pattern matching of biological sequences (DNA & protein) Align & merge DNA fragments to reconstruct the original sequence Screening of large database libraries of potential drugs for ones with desired biological activity Modeling of biological molecules using Molecular Dynamics & Quantum Mechanics techniques Interpreting mass spectrometry data and matching the spectra to protein database Sequencing the Human Genome Structured Data. Integer dominant, frequency dependent, large caches & memory BW not critical, some algorithms are suited to SIMD acceleration Usually have large memory footprint Mostly floating point, very compute intensive, highly parallel Very floating point intensive, latency critical, frequency dependent, scalable to low 100s Mostly Integer dominant, frequency dependent. Not communication intensive NCBI BLAST, wublast ClustalW FASTA HMMER Phrap/phred, CAP3/PCAP Velvet,,ABySS, SOAPdenovo MAQ, BOWTIE, BFAST, SOAP, Eland, SHRiMP GAP, pgap (TAMU) Autodock GLIDE Dock Flexx FTDock LigandFit AMBER NAMD CHARMM / CHARMm Desmond GROMACS Gaussian GAMESS Jaguar NWCHEM Mascot Sequest ProteinProspector X!Tandem OMSSA (Source IBM) Sequencing the human genome - there are nearly three billion DNA base pairs in an individual human genome - is laborious, costly, and time consuming. Today the genome sequencing / biosciences industry segment is dominated by research institutes that generate a tremendous amount of data. These organizations require significant computation capabilities and data storage capacity in order to process and analyze that data. Copyright 2010 Harvard Research Group, Inc page 2

"The $10 million X PRIZE for Genomics prize purse will be awarded to the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 per genome." (Source http://genomics.xprize.org/archon x prize for genomics/prize overview ). Since 2003 when the cost to sequence a single human genome was roughly $3 billion, dramatic reductions in the cost per genome sequenced have continued. Much of this cost reduction is due to applied technology such as the new HPC Life Science solution enabling capabilities provided by IBM ex5 systems. When the cost per complete genome sequenced gets down to $10,000 the shift from research to the clinical / commercial application of this science should begin and personalized medicine can then become a reality. In genome sequencing raw "DNA base" data produced by the sequencing appliance results in the generation of extreme volumes of data (multiple terabytes) that must then be stored for further analysis. After the assembled and aligned genomic image data has been stored, secondary analysis and research on the data can be performed. This area of secondary analysis is a key focus for researchers and technology providers. Copyright 2010 Harvard Research Group, Inc page 3

Genome Sequencing Whole or full genome sequencing of the DNA bases or segments that constitute each human genome produces very large volumes of data which then has to be stored for secondary processing and analysis such as the identification of genetic markers which serve as indicators for a specific disease. This type of secondary analysis requires significant high performance compute power, memory, and storage capacity. High throughput full genome sequencing is becoming a reality with the development and ongoing refinement of technologies such as IBM's nanopore "DNA transistor". The promise of nanopore based genomic sequencing is to sequence whole strands of DNA, dramatically increase sequencing throughput and accuracy and move the cost point of sequencing a single human genome to less than $1,000 per genome. The result will be a significant increase in the volume of available genome sequence data. The population of a human genome data repository will dramatically improve the identification of the genetic and proteomic basis for disease before a disease presents itself. This in turn will facilitate proactive prevention and treatment through life style counseling, personalized medicine, and even unique custom personalized drug therapies. The patent for Nanopore technology is held by Harvard University and Oxford Nanopore Technologies. Second Level Analysis The second level analysis of clean, error free, deduplicated genomic data presents the next big challenge that Life Science researchers will need to meet. This type of workload - typified by high volume in memory data analysis - will drive HPC symmetric multiprocessor (SMP) server and clustered system utilization. The primary emerging computational requirement driven by his type of work load is the ability to load and process massive amounts of data by performing large in-memory data manipulation. This big data workload requirement drives the consumption of flash drive based data stores and high memory capacity systems. An additional requirement will be for higher frequency higher throughput multi-core chips to deal with multi threading SMP related throughput requirements. Overall IT requirements are driven by the need to produce meaningful results in the least possible amount of time and for the lowest possible cost. Personalized Medicine Genomic sequencing is a prerequisite to the development of personalized medicine which will be based on the combination of data derived from patient clinical history and genomic sequencing. Getting the cost of sequencing a complete human genome under $10,000 is a requirement for the field of personalized medicine to open up. As the repository of genome sequencing data is established researchers will be able to analyze anomalies or differences of one genome compared to the base level mapping of the entire human genome or an equivalent proxy. Through this type of analysis researchers will identify and decipher disease specific genetic markers resulting in individual / personalized preventative and therapeutic medical applications. The field of personalized medicine will not realize its full potential until these genomic and clinical history data repositories have been established and populated. Security, privacy, disaster recovery, availability, and other emerging concerns will have to be addressed in order to pave the way for fulfilling the promise that personalized medicine holds. Personalized medicine will be based on the combination of personal, historic, clinical, and genomic data in a single source. This could take the form of a machine readable card, subcutaneous RFID chip, or such device that an individual could carry with them. This device will have the individuals unique genome encoded along with other relevant personal information. When this individual goes to a pharmacy, for example, a pharmacist will be able to cross check the encoded genome, clinical historical data, and existing prescriptions Copyright 2010 Harvard Research Group, Inc page 4

in order to ensure that any new prescription does not conflict with existing conditions or treatments. This type of information when available will be a game changer in terms of enhancing the physician's clinical therapeutic efficacy. Cluster and/or SMP IBM System ex5 servers with the Intel Xeon processor 7500 series can run either SMP type HPC workloads or applications using MPI HPC code resulting in higher levels of system utilization by helping avoid situations where clusters may be idle due to the higher costs of application enabling for a cluster as compared to an SMP system. Life Sciences applications such as Abyss (MPI code) are specifically written for distributed memory environments and for workloads of this type. HRG expects to see small workgroup cluster systems of up to eight nodes being replaced by next generation SMP/MPI capable servers such as IBM's ex5 systems. The versatility of these new systems will allow customers to run in either SMP mode or MPI mode enabling significantly higher overall system utilization making the IBM ex5 server a true all-in-one solution. IBM ex5 IBM's ex5 Servers bring to market major solution building block elements (x3850 X5, BladeCenter HX5, and x3690 X5) designed to meet the continually increasing Life Sciences HPC workload requirements of labs, universities, and corporate commercial R & D. The ex5 builds on the Intel Xeon processor 7500 series with increased memory capacity, flexible storage, virtualization and system reliability for the Life Sciences HPC market. This offering delivers the compute power, memory capacity and bandwidth to solve big science problems faster. The new ex5 HPC compute infrastructure servers in combination with Intel's hyper threading multi core processors satisfy Life Science application large memory, SMP, and parallelism requirements. Expanded memory capabilities for faster results IBM silicon allows processors on ex5 systems to access extended memory very quickly and delivers the largest memory capacity in the industry. The IBM Enterprise X-Architecture chip is in its fifth generation with ex5 and leverages decades of IBM experience in integrating microelectronics to create first-of-a-kind silicon solutions. A component of the extended memory solution from IBM is the unique memory expansion with the external MAX5 memory chassis, decoupling server memory from system processors. The MAX5 for ex5 racks and blades enables systems to more than double the number of addressable memory DIMMS per processor, and allows increased memory with MAX5 up to twice the memory capacity currently provided in the industry. The new ex5 systems with IBM exflash solid state disk drives and MAX5 memory expansion represent a new class of energy-efficient cost-effective high-performance compute engines. The ex5 provides support for both the VMware ESXi and the open source KVM-based Red Hat RHEV-H virtualization hypervisors enabling data center consolidation and high density compute configurations. Now with the IBM ex5 MAX5 memory expansion, complete databases can be held in memory accelerating system performance and enhancing throughput by avoiding the latency associated with traditional page swapping requirements. One case in point: a two socket ex5 system with MAX5 installed can support up to 320 virtual machines. This magnitude of virtualization conserves power, saves money on licensing costs, and significantly reduces environmental conditioning (HVAC and power) and space requirements. Copyright 2010 Harvard Research Group, Inc page 5

IBM's exflash technology is an environmentally friendly replacement for older hard disk drive storage subsystems. exflash can reduce storage costs by up to 97% and deliver up to 30x more local database performance. With the addition of IBM's Systems Director capabilities customers can pre-configure servers, remotely re-purpose systems and set up automatic updates and recoveries. Conclusion With increased emphasis on massively parallel / high throughput sequencing and the resultant extreme volumes of data which will be generated it makes sense to offload that data from the sequencing appliance to a nearby SMP HPC server like an ex5 running Intel Xeon 7500 series processors. This solution is purpose built to maintain high throughput, provide large memory capacity, and provide access to high volumes of data for assembly, alignment, and secondary analysis such as that required for the identification of disease markers. In the case of high-throughput whole genome sequencers the base calling and other quality functions are typically performed on the raw genome (DNA base) sequence data while the data is resident in the appliance's memory. Then the reduced and mostly error free data can be moved to a nearby server like an IBM ex5 for assembly and alignment. As the Life Sciences industry reaches and passes the X PRIZE target of sequencing 100 human genomes in ten days the requirement to offload this data from the sequencing appliance will be necessitated. HRG believes that an IBM ex5 system with expanded memory and high speed solid state disk drives is a ideal choice for satisfying these emerging high-throughput large data Life Science analysis requirements. Copyright 2010 Harvard Research Group, Inc page 6

Harvard Research Group is an information technology market research and consulting company. The company provides highly focused market research and consulting services to vendors and users of computer hardware, software, and services. For more information please contact Harvard Research Group: Harvard Research Group PO Box 297 Harvard, MA 01451 USA Tel. (978) 456 3939 Tel. (978) 925 5187 e mail: hrg@hrgresearch.com http://www.hrgresearch.com Copyright 2010 Harvard Research Group, Inc page 7