Building a Multi-Tenant Infrastructure for Diverse Application Workloads Rick Janowski Marketing Manager IBM Platform Computing 1
The Why and What of Multi-Tenancy 2
Parallelizable problems demand fresh approaches Financial Market and Credit Risk, Insurance Credit-scoring, Fraud Detection Calculate this now! over 500,000 scenarios, 500 instruments, 200 time steps. Life Sciences Genome Mapping Mine 24 months of credit card purchases for 30,000,000 cardholders to identify credit-worthy customers by geography CAE Parametric sweeps, DOE Contrails perform assembly and mapping of large genomes in hours rather than weeks using MapReduce programming model. Perform designs of experiment and parametric sweeps for a variety of computer-aided design applications to find optimal designs without physical prototyping. 3
Business drivers Ever-increasing expectations Insatiable appetite for deeper, more thorough analysis Results increasingly time critical Better insights into exploding volumes of big data Control of administrative and infrastructure costs to grow computing capability within IT budget constrains 4
Technical needs Ever-increasing expectations: Increased performance to support business demands Increased scalability to address huge and growing volumes of data Optimized use of existing resources for scaled performance Efficient data management to remove data bottlenecks Support for new, cloud-native application workload patterns Effective operational management monitoring, alerting, diagnostics and security 5
Scale up versus scale out App App 6
Big Data and Analytics infrastructure silos are inefficient Many new solution workloads in addition to existing apps Leads to costly, siloed, under-utilized infrastructure and replicated data Batch Overnight Financial Reporting Counterparty Credit Risk Modeling Distributed ETL, Sensitivity Analysis Hadoop Sentiment Analysis Low Utilization = Higher cost 7
Scale-out challenges Silos Underutilization of resources Management and reporting challenges Different clusters for different workload types Arduous and time-consuming cluster reconfiguration between workloads types Separate clusters for different versions of Hadoop (or other key applications) 8
The role of multi-tenancy New cost- and space-efficient workload and resource management approaches offer solutions for multiple workload types, applications and Hadoop versions on the same cluster Inconsistent terminology =/ 9
Multi-tenancy: the narrow view A single instance of an application serves multiple client organizations (tenants) In a multitenancy environment, multiple customers share the same application, running on the same operating system, on the same hardware, with the same data-storage mechanism. (Wikipedia: http://en.wikipedia.org/wiki/multitenancy) 10
Dimensions of multi-tenancy (shared services) App Multiple users, groups or departments Multiple workload patterns Multiple sessions Multiple versions Multiple instances Multiple operating environments / platforms App App App App 11
Implementing Multi-Tenancy 12 12
About Platform Symphony Heterogeneous grid management platform Supports multiple users, applications and lines of business on a shared grid A B C D Data Grid / Data Analytics Application A-Algorithmics B Commercial Software C-Proprietary Models D-Other Analytic Apps Workload Manager C C C C C C B B A A A A A A A A C C C C C C B B A A A A A A A A C C C C C C B D D D D D D B B B B B B B B B B D D D D D D B B B Resource Orchestration 13 13
Platform Symphony Reducing Cost Avoid expensive application and departmental silos Share infrastructure while protecting SLAs Avoid to infrastructure spending Improve utililization Department A Department B Department C Application A Application B Application C 14
Platform Symphony Improve Performance and Predictability Sub-millisecond latency Massive scale Flex instantly to reflect business priorities Better quality results faster Department A Department B Department C Application A Application B Application C Scenario: Urgent pretrade analysis to drive critical hedging decisions Result: Resources reallocated instantly according to policy resulting in faster more thorough simulation and a time-to-market advantage. 15
Platform Symphony Sophisticated Resource Sharing Enables sharing while preserving ownership Near 100% sustained resource utilization Allocations flex quickly to reflect business priorities Support new applications with existing infrastructure Platform Symphony improves on application SLAs while using resources more efficiently than competing grid managers 16
Platform Symphony MultiCluster Scale beyond a single cluster Unified management of distributed clusters Full visibility to resources, users and applications Select appropriate cluster at runtime Maximize resource usage Simplify reporting and capacity planning Cluster A Cluster B Platform Symphony Grid Cluster C 17 17
How is Platform Symphony unique? Low Latency / Hi-throughput Sub-millisecond response >17,000 tasks per second throughput Large Scale 10,000 cores per application 280,000 cores per grid Cost-Efficient, Shared Services Multi-tenant grid solution Guarantees SLAs while encouraging resource sharing Easy to on-board new grid applications Maximizes use of Grid Resources Heterogeneous and Open Linux, Windows / Windows HPC, AIX, Solaris C/C++, C#, Java, Excel, Python, R Smart data handling, Data Affinity Native, optimized Map Reduce implementation 18 18
Use Case for Grid: Liquidity Risk Analysis The Client A publicly traded bank with both retail and wholesale operations in over 120 countries and with over $500 billion in assets under management High Growth Impacting its Risk IT The bank uses IBM Algorithmics for its Liquidity Risk Analysis (LRA), running on dedicated servers Time-to-completion increased to 100 hours for analyzing now 150,000-plus records Added Platform Symphony Grid Increased # of available cores by 6 times by borrowing idle cores Decreased time-to-completion for LRA to 10 hours 19 19
Platform Symphony and Algorithmics 20 20
IBM Algorithmics Market risk Credit risk Liquidity risk Collateral and capital management 21
Algorithmics presents multiple opportunities for parallelism Algo One Software Services Customer Data Risk Mapper Algo Data Server Scenario Generation Simulation Aggregation Reporting ASE RW ASE ARDB CSV CSV CSV RM RM ADS ADS ASE RW RW ASE RPM Scenarios RM ADS RW Algo Risk Engine ADB RW RW ARE ARE ACE Algo Cube Explorer Platform Symphony Shared Services Grid Infrastructure for Compute and Data Intensive Applications 22 22
Algorithmics presents multiple opportunities for parallelism Algo One Software Services Customer Data Risk Mapper Algo Data Server Scenario Generation Simulation Aggregation Reporting 2 CSV ASE RM ADS ASE CSV Algo database RM performance ADS can benefit from CSV parallel database technologies RM ADS Scenarios RW RW RW RW 4 ASE Algo Risk Engine ASE is already parallelized, and multiple aggregation activities RPMcan run concurrently across various job streams controlled by Algo Batch Algo Risk Engine ARDB 1 ADB Opportunity for multiple concurrent risk mappers to reformat customer data into a schema loadable by the Algo Data Server RW RW 3 ACE ARE RiskWatch can run multiple parallel Algo Cube simulation scenarios on subsets Explorer of instruments generating ARE cubelets representing simulation results Platform Symphony Shared Services Grid Infrastructure for Compute & Data Intensive Applications 23 23
Why use a Grid Manager with Algorithmics? Best host dynamically selected at run-time Avoid hard-coding hosts Easier to manage as environment grows Grid guarantees task execution (avoiding the need for elaborate exception handling and scripting) Dramatic reduction in process run-times SLAs guaranteed, task completion within batch windows assured Improved administrator productivity Enhanced quality of service to analysts and business users Use assets more efficiently reducing total infrastructure cost 24 24
Platform Symphony Integration from an Algorithmics user s perspective Analytic processes in Algorithmics run under the control of Algo Batch Job Boxes represent discrete units of work in Algo work flows High level tasks can be parallelized within ABE Stream execution can be dramatically accelerated by using a grid manager to accelerate speed of individual job boxes 25 25
Platform Symphony Integration from an Algorithmics user s perspective Platform Symphony can accelerate analytic processes such as simulation and cube generation by enabling tasks to execute reliably in parallel at large scale 26 26
Platform Symphony Integration from an Algorithmics user s perspective Platform Symphony reflects the name of Algorithmics Batch job box names as session names and tags for ease of management 27 27
Unique advantages of the Platform Symphony integration AlgoEngine Application RW RW RW RW Risk watch instances started once and re-used dynamically depending on relative priorities of Algo job streams, avoiding the need to start and stop instances for each MtF cubelet. stop all RW Algo lookup Session 1 Priority X START Session 2 Session N Priority RiskWatch Y instances are dynamically assigned according to Symphony loaning and borrowing rules and the relative priority of job streams. Priority ZResource allocations can flex dynamically based on timebased ownership rules, sharing policies and priorities STOP 28 28
The case for a grid / workload manager Platform Symphony enables near 100% resource utilization by enforcing the notion of ownership while still enabling departments to share resources based on flexible policies. 29 29
Conclusion 30
Big Data and Analytics infrastructure silos are inefficient Many new solution workloads in addition to existing apps Leads to costly, siloed, under-utilized infrastructure and replicated data Batch Overnight Financial Reporting Counterparty Credit Risk Modeling Distributed ETL, Sensitivity Analysis Hadoop Sentiment Analysis Low Utilization = Higher cost 31
IBM Platform Symphony A multi-tenant shared services platform with sophisticated resource-sharing capabilities Manages diverse system and application services on a shared infrastructure ISV applications, in-house developed applications Optimized low-latency Hadoop compatible run-time Can be used to launch, persist and manage non-grid-aware application services Supports long-running cloud-native frameworks such as MongoDB and Cassandra* Service controller guarantees reliable execution, understands dependencies * * Application Service Controller announced October 2014 32
The state of the art Organizations need ways to add new workloads, implement new applications and deploy new versions of Hadoop without having to deploy new clusters While there are emerging approaches aimed at reducing the need for multiple, disparate clusters, each of the approaches has significant limitations IBM Platform Symphony combines multi-tenancy / shared services and multi-modality so organizations can run SOA, batch and long-running service workloads on a single cluster Built on an architecture designed for low-latency scheduling, Platform Symphony enables organizations to gain the performance for running time-critical jobs while maximizing the value of their hardware investments 33
Rick Janowski rjanowski@us.ibm.com +1 720-395-7852 34