InfoSphere DataStage Grid Solution

Similar documents
An IBM Proof of Technology IBM Workload Deployer Overview

IBM Tivoli Monitoring

IBM Tivoli Monitoring (ITM) Georgia IBM Power User Group. Chip Layton Senior IT Specialist STG Lab Services

HP Cloud Maps for rapid provisioning of infrastructure and applications

IBM Tivoli Workload Automation View, Control and Automate Composite Workloads

IBM A Accessment: Power Systems with POWER7 Common Sales Skills -v2.

St Louis CMG Boris Zibitsker, PhD

InfoSphere Warehousing 9.5

IBM Tivoli OMEGAMON XE on z/vm and Linux

Prepare for a more efficient SAP implementation: Take data issues off the critical path

<Insert Picture Here> Oracle Exalogic Elastic Cloud: Revolutionizing the Datacenter

Advantage Risk Management. Evolution to a Global Grid

FUT SLES for POWER Trends and Roadmap. Jay Kruemcke Product

A Examcollection.Premium.Exam.35q

Grid 2.0 : Entering the new age of Grid in Financial Services

Building a Multi-Tenant Infrastructure for Diverse Application Workloads

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

Cloud Computing An IBM Perspective

IBM ICE (Innovation Centre for Education) Welcome to: Unit 1 Overview of delivery models in Cloud Computing. Copyright IBM Corporation

VMware Cloud Automation Design and Deploy IaaS Service

IBM Smarter systems for a smarter planet IBM workload optimized systems

IBM i Reduce complexity and enhance productivity with the world s first POWER5-based server. Highlights

Tech Mahindra s Cloud Platform and PaaS Offering. Copyright 2015 Tech Mahindra. All rights reserved.

Oracle Cloud Blueprint and Roadmap Service. 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Why You Need An Enterprise-wide Cloud Strategy?

Introduction to IBM Technical Computing Clouds IBM Redbooks Solution Guide

IBM WebSphere Extended Deployment, Version 5.1

IBM Business Process Manager on Cloud

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

DataAdapt Active Insight

IBM Tivoli Endpoint Manager for Software Use Analysis

IBM Digital Analytics Accelerator

Deploying IBM Cognos 8 BI on VMware ESX. Barnaby Cole Practice Lead, Technical Services

Product Brief SysTrack VMP

IBM Grid Offering for Analytics Acceleration: Customer Insight in Banking

IBM Power Systems. Bringing Choice and Differentiation to Linux Infrastructure

IBM Information Server on Cloud

IBM Information Server on Cloud

Retail Business Intelligence Solution

What is your definition of DevOps?

Service Management for the Mobile Mainframe Delivered via Cloud Lunch and Learn

InfoSphere Warehouse. Flexible. Reliable. Simple. IBM Software Group

Bluemix Overview. Last Updated: October 10th, 2017

RODOD Performance Test on Exalogic and Exadata Engineered Systems

Introduction to Enterprise Computing. Computing Infrastructure Matters

VMware vcenter Operations Standard

IBM Oracle ICC. IBM s Live Partition Mobility (LPM): Oracle s Database Support

IBM SmartCloud Control Desk: High Availability and Disaster Recovery Configurations IBM Redbooks Solution Guide

Services Guide April The following is a description of the services offered by PriorIT Consulting, LLC.

Accelerate Deployment

Administering System Center Configuration Manager and Intune (20696C)

IBM Tivoli Workload Scheduler

IBM Virtual Appliance for Oracle Database

UAB Condor Pilot UAB IT Research Comptuing June 2012

NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION APPLICATION SIZING GUIDE FOR SIEMENS NX APPLICATION GUIDE. Ver 1.0

Enterprise-Scale MATLAB Applications

IBM dashdb for Analytics

Systems Management of the SAS 9.2 Enterprise Business Intelligence Environment Gary T. Ciampa, SAS Institute Inc., Cary, NC

IBM Business Process Manager on Cloud

SAVE MAINFRAME COSTS ZIIP YOUR NATURAL APPS

Customer IT Automation Success Story

Viewpoint Transition to the cloud

Meetup DB2 LUW - Madrid. IBM dashdb. Raquel Cadierno Torre IBM 1 de Julio de IBM Corporation

IBM Domino Applications On Cloud. Overview and Q&A with Business Partners

Exalogic Elastic Cloud

IBM Balanced Warehouse Buyer s Guide. Unlock the potential of data with the right data warehouse solution

You can plan and execute tests across multiple concurrent projects and people by sharing and scheduling software/hardware resources.

On demand operating environment solutions To support your IT objectives Transforming your business to on demand.

<Insert Picture Here> Cloud: Is it Ready for Prime Time?

Cloud Capacity Management

Enterprise Removable Media Manager

SERVICE DESCRIPTION MANAGED PRIVATE CLOUD

The IBM and Oracle alliance. Power architecture

You can plan and execute tests across multiple concurrent projects and people by sharing and scheduling software/hardware resources.

Learn How To Implement Cloud on System z. Delivering and optimizing private cloud on System z with Integrated Service Management

"Charting the Course... MOC A: Architecting Microsoft Azure Solutions. Course Summary

Implementing a Service Management Architecture

Agenda. z/vm and Linux Performance Management. Customer Presentation. New Product Overview. Opportunity. Current products. Future product.

IBM Informix Dynamic Server (IDS) on Amazon EC2

Innovative solutions to simplify your business. IBM System i5 Family

The Information Integration Platform

NetVault Backup System Administration Complete Instructorled

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

zexperten Forum Luzern 10/2016 Cloud on Mainframe / Don t worry be API

IBM United States Software Announcement , dated March 27, 2018

SapphireIMS 4.0 ITAM Suite Feature Specification

Accelerating Computing - Enhance Big Data & in Memory Analytics. Michael Viray Product Manager, Power Systems

Application Migration Patterns for the Service Oriented Cloud

IBM Global Technology Services. Weaving the solution Dharanibalan Gurunathan 1 st August, Mumbai

IBM _` iseries systems Retail

Dell Advanced Infrastructure Manager (AIM) Automating and standardizing cross-domain IT processes

Using SAP with HP Virtualization and Partitioning

Virtual Desktop Computing Solution Request for Proposal (RFP), Addendum 1 October 22 nd, 2012

Translate Integration Imperative into a solution Framework. A Solution Framework. August 1 st, Mumbai By Dharanibalan Gurunathan

IBM Software Services for Lotus To support your business objectives. Maximize your portal solution through a rapid, low-risk deployment.

IBM SmartCloud public images with selected software

Integrated systems for operational analytics

Nikon Precision's Journey to the PLM Cloud. Ryan Porzel

IBM Business Process Manager on Cloud

Trasformare il Business con Soluzioni Cloud

Transcription:

InfoSphere DataStage Grid Solution Julius Lerm IBM Information Management 1 2011 IBM Corporation

What is Grid Computing? Grid Computing doesn t mean the same thing to all people. GRID Definitions include: Using Idle machines on the internet Using Idle desktop machines within the company Using any server that s not currently in use Regardless of OS, physical Location, CPU speed Anything running more than 1 Linux box in any manner Anything running on any computer when you don t care which computer it runs on

What is Grid Computing with InfoSphere DataStage? Low cost solution that provides high throughput processing InfoSphere DataStage Grid Toolkit Allows for maximum resource allocation flexibility to one or more project teams Available for DataStage, QualityStage, and Information Analyzer Enables both grid distribution methods simultaneously assigning jobs to specific servers in the grid assigning a single parallel job to run across multiple servers Platforms: RedHat or SuSE (Intel/AMD) AIX/Power

What Is Driving Rapid Customer Adoption of Data Integration Grid? Value $$ Better decisions based on better data yields ROI$$$ Grid-based integration makes it possible for companies to process and analyze larger data volumes, create a consolidated view of data, and put the right data into the enterprise data warehouse and other critical enterprise applications More sources of data, more data from each source, better matching, real-time versus batch Better data yields: Better business decisions Enhanced customer relationships More cross selling and upselling New services delivered to customers Reduced Data Integration Costs Reduced administration and operating costs centralization of staff Reduced data integration project costs lower cost per project delivered by data integration center of excellence versus siloed projects Reduced hardware costs Cost $$

Benefits of Grid Computing Low cost hardware High-throughput processing Significant ROI (Return on Investment) for data management solutions Supports a high-availability (HA) solution Resource manager monitors availability of hardware at startup / job deployment time SLA (Service Level Agreement) Consistent runtimes Isolates concurrent job executions Shared resource pool Not typical silo-ed environment Hardware shared across multiple environments and departments

Why Grid? Improve the return on infrastructure investments! Ø Help improve infrastructure price/performance Ø Improve the utilization of computing resources Ø Help provide unlimited scalability and offer capacity on demand Ø Optimize the allocation of resources to applications Ø Help reduce complexity, consolidate servers, storage and data centers Ø Provide a highly available environment Ø Help eliminate single points of failure Ø Optimize use of available processing resources Ø Ensures that application tasks complete within stable predictable time frame à improving SLA performance

Before Grid ProfileStage DataStage QualityStage DataStage... Project 1 Project 2 Project 3 Project 4 IBM Software Project N SMP 1 SMP 2 SMP 3 SMP 4 SMP N Silo-ed architecture & proliferation of SMP servers: Higher capital costs through limited pooling of IT assets across silos Higher operational costs Limited responsiveness due to more manual scheduling and provisioning Inherently more vulnerable to failure No ability to exploit available capacity when other teams are idle

After Grid ProfileStage Project 1 DataStage Project 2 QualityStage Project 3 DataStage Project 4... IBM Software Project N DataStage Multi-Process Grid Framework Grid enables increased efficiency, responsiveness, variability and speed. Virtualized infrastructure: Node 1 Node 2 Node 3 Node 4 Creates a virtual data integration collaboration environment Virtualizes application services execution... Node N Dynamically fulfills requests over a virtual pool of system resources (nodes) Offers an adaptive, self-managed operating environment that guarantees high availability Delivers maximum available capacity to anyone participating in the grid Eliminates The SMP nightmare, Allows Unlimited Scalability

InfoSphere DataStage Grid Definition It s an InfoSphere DataStage Cluster that supports the dynamic creation of Config Files All the requirements for a Cluster apply Adds Resource Manager Grid Toolkit (GTK)

Cluster Requirements Shared Storage NIS/LDAP Networking Passwordless SSH HA Solution DB Connectivity Users Review System Requirements Platform Requirements Usually Implemented by Client s own sysadmins and storage/ networking personnel

Sample IS Grid

Resource Management Tracks resources (nodes) based on which jobs are already running, which servers are down Queues jobs when no resources are available Provides a list of nodes that are assigned for a job Extensive advanced features We leverage a subset of the features Manager node where tasks are scheduled and resources allocated Usually happens on the head node Compute nodes have agent processes that communicate back to the manager Jobs (scripts or executables) are started on compute node, not head node

LoadLeveler Classes LoadL_admin file: dsbatch: type = class # class for medium jobs priority = 50 # ClassSysprio max_total_tasks = 50 class_comment = "Class for DataStage batch jobs dsrealtime: type = class # class for medium jobs priority = 50 # ClassSysprio max_total_tasks = 50 class_comment = "Class for DataStage batch jobs LoadL_config.local: CLASS = dsbatch(4) dsrealtime(3)

Grid Enablement Toolkit What does it do? Prebuilt integration with resource managers Coordinates activities between the parallel framework and the resource manager Creates the parallel configuration file to drive the dynamic assignment of compute resources Logging (interaction w/ RM, usage details)

Grid Environment Variables In Administrator

Grid Toolkit Environment Variables APT_GRID_ENABLE YES: Current osh will intercept the run script to create a new configuration file NO: Use the existing configuration file APT_GRID_QUEUE Name of the Resource Manager queue the job will be submitted to APT_GRID_COMPUTE_NODES The number of compute nodes required for the job Used to request the number of compute nodes in the dynamically created configuration file A compute node is a server that can be used for processing Not e.g. dedicated for IO or DB2 Default value is 1 APT_GRID_PARTITIONS Used to create multiple partitions for each compute node Default value is 1

Dynamic Config file Dynamic Config File Static Config File 19

Grid Config Template 20

Generated Config APT_GRID_COMPUTENODES=3 21

Information Server (IS) Grid Solution Offering A total solution approach to implement an Information Server Grid system based on proven methods & best practices Features This offering consists of the following phases: Education & Project Planning 22 Common understanding of IS Grid Architecture & Design Customer responsible for building the Grid Infrastructure IS Grid Implementation Install & configure the IS Grid Framework Test & Validate Address Administration Hand-Off Monitor during 24 hours Backed by world class industry and product experts in deploying InfoSphere software Benefits Leverage repeatable, proven processes and standard collaterals to reduce costs and project risks Accelerate the time to value and return on investment with the knowledge and best practices brought by our Information Server Grid experts Gain knowledge transfer and mentoring from our experts Deliverables Environment Prerequisite Checklist Grid Planning & Architecture Document Project Plan System Configuration Guide Standard Collateral Build your own Grid Toolkit Grid Enablement Toolkit Basic High Availability

Project Approach Educatio n & Project Planning Ø Education & Project Planning Workshop (1 week) Educate/obtain a common understanding about a grid environment Start to create a customer environment prerequisite checklist Ø Architecture & Design Workshop (15 days spread in 3 to 4 weeks) Interactive discussion/collaboration with the customer to finalize the architecture of the grid environment Finalize the customer environment prerequisite checklist Ø Implementation of the Infrastructure by Customer or GTS Following the customer environment pre-requisite checklist (System Requirements) Ø IS Grid Implementation (3 weeks) Implementatio n by Customer Architecture Design IS Grid Implem entatio n Review/check that the customer environment is ready Install & Configure Information Server Grid Framework based on the defined architecture Migrate existing parallel DataStage jobs & sequences (max. 50) to the new environment (optional, as needed) Two days monitoring period One SOW that includes all three Phases

Deliverables Ø Education & Project Planning Workshop Understanding all infrastructure implication to operate a GRID Environment (Standard Collateral s) Build your own Grid Toolkit Grid Enablement Toolkit Basic High Availability Project Plan Ø Architecture Design Environment Prerequisite Checklist Grid Planning & Architecture Ø IS Grid Implementation Grid Toolkit Software System Configuration Guide Validated Test

Education Prerequisite Code Name Audience Duration DX444 DataStage Essentials Learn about DataStage V8.1 in its IBM Information Server environment. Learn how to build DataStage parallel jobs that read and write data to and from a variety of data stores including sequential files, data sets, and relational tables. Also, learn how to build parallel jobs that process data in a variety of ways: business transformations, data filtering, data combining, data generation, sorting, and aggregating. 4 days

Other Reference Ø GTS Grid Offerings: http://spimweb1.boulder.ibm.com/services/sosf/dyno.wss?oid=625#1 Ø GTS - Implementation Services for SAN storage software http://spimweb1.boulder.ibm.com/services/sosf/dyno.wss?oid=771 Ø GTS - Implementation Services for Network Attached Storage systems http://spimweb1.boulder.ibm.com/services/sosf/dyno.wss?oid=805 Ø Grid Redbook: http://www.redbooks.ibm.com/redbooks/pdfs/sg247625.pdf Ø White Paper: http://w3-103.ibm.com/software/xl/services/document/getattachment/document/p911991a56203i41/ attachment/imw12027-usen-00.pdf Ø Tivoli Dynamic Workload Broker: http://www-304.ibm.com/jct03001c/services/learning/ites.wss/us/en? pagetype=course_description&coursecode=to075