Managing Data Warehouse Growth in the New Era of Big Data

Similar documents
InfoSphere Warehouse. Flexible. Reliable. Simple. IBM Software Group

MapR Pentaho Business Solutions

Strategies for Taming Data Growth through Archiving

Information Lifecycle Management Solution from IBM

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

A Better Way to Run SAP Joakim Zetterblad Director SAP Practice, EMC EMEA

REPENSEZ VOTRE STRATÉGIE SAP ET ENTREZ DANS LE CLOUD HYBRIDE

InfoSphere Warehousing 9.5

ActualTests.C Q&A C Foundations of IBM Big Data & Analytics Architecture V1

Innovative solutions to simplify your business. IBM System i5 Family

Microsoft FastTrack For Azure Service Level Description

COMPANY PROFILE.

The IBM Reference Architecture for Healthcare and Life Sciences

Oracle Integrates Virtual Tape Storage with Public Cloud Economics

Data Archiving. The First Step Toward Managing the Information Lifecycle. Best practices for SAP ILM to improve performance, compliance and cost

Harnessing the Power of Big Data to Transform Your Business Anjul Bhambhri VP, Big Data, Information Management, IBM

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Russell Hull - SAP

Modernizing Data Integration

How Data Science is Changing the Way Companies Do Business Colin White

Bringing the Power of SAS to Hadoop Title

Oracle's Cloud Strategie für den Geschäftserfolg Alles Neue von der OOW

SAS & HADOOP ANALYTICS ON BIG DATA

Cloud Based Analytics for SAP

Challenges and Chances of Nearline Storage

Stewart Bazneh. IBM Storwize V National Gallery, Dublin (17 th November) 2010 IBM Corporation

Using the Blaze Engine to Run Profiles and Scorecards

DELL EMC POWEREDGE 14G SERVER PORTFOLIO

IBM Systems Group. IBM TotalStorage Ultrium Tape Family

Spectrum Control Capacity Planning

The Intersection of Big Data and DB2

Oracle Autonomous Data Warehouse Cloud

Get Your Documentum, SAP, SharePoint Data Under Control

Oracle Engineered Systems and WeDo Technologies

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

INTRODUCING BIRST INFOR S GO-FORWARD BUSINESS INTELLIGENCE SOLUTION

Architecture Overview for Data Analytics Deployments

Lenovo Services for the Data Center

DEFINING THE ROI FOR MEDICAL IMAGE ARCHIVING

Microsoft BI Product Suite

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Dynamics 365 for Finance and Operations: Cloud, On-Premises, and Hybrid Deployment Options

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

vsphere with Operations Management and vcenter Operations VMware vforum, 2014 Mehmet Çolakoğlu 2014 VMware Inc. All rights reserved.

Oracle Autonomous Data Warehouse Cloud

IBM ICE (Innovation Centre for Education) Welcome to: Unit 1 Overview of delivery models in Cloud Computing. Copyright IBM Corporation

In-Memory Analytics: Get Faster, Better Insights from Big Data

Boston Azure Cloud User Group. a journey of a thousand miles begins with a single step

Sr. Sergio Rodríguez de Guzmán CTO PUE

IBM Analytics Six reasons to upgrade your database

BMC - Business Service Management Platform

DELL EMC Isilon & ECS for Healthcare

S U R V E Y I D C O P I N I O N. Cushing Anderson

2015 IBM Corporation

The Evolution of Big Data

Spectrum Storage Suite -Getting From Here to There. Karel van der Woude

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

EMC Information Infrastructure Solutions for Healthcare Providers. Delivering information to the point of care

Ensuring a Sustainable Architecture for Data Analytics

COMPARE VMWARE. Business Continuity and Security. vsphere with Operations Management Enterprise Plus. vsphere Enterprise Plus Edition

Quantum Artico Active Archive Appliance

Xerox DocuShare 7.0 Content Management Platform. Enterprise content management for every organization.

S/4 HANA Introduction & Roadmap. Dr. Bjoern Ganzhorn Enterprise Architect - SAP Americas Inc.

Cognitive enterprise archive and retrieval

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

Data Governance and Data Quality. Stewardship

Solution Brief: Using a XenData Archive with Apace postmam for Managing and Retaining your Video Assets

Innovate with Oracle Public Cloud Platform & Infrastructure Services

Exadata Implementation and Performance Benefits in Toyota Motor Sales USA.

1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle DBaaS Migration Road Map. 27 January, 2016

DELL EMC ISILON STORAGE SOLUTIONS FOR MEDIA AND ENTERTAINMENT

Going beyond today: Extending the platform for cloud, mobile and analytics

Title: HP OpenView Configuration Management Overview Session #: 87 Speaker: Loic Avenel Company: HP

Unified Archiving with dg hyparchive. Compliance. Security. Transparency.

Huntington Bancshares

Microsoft Azure Essentials

What's Shaping the Future of Enterprise Content. Management? JOHN O MELIA

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

Oracle Big Data Cloud Service

IBM Enterprise Content Management. Why IBM ECM Unique value propositions and differentiators from IBM Enterprise Content Management

Big Data Management Best Practices for Data Lakes Philip Russom, Ph.D.

Universal Storage for Data Lakes: Dell EMC Isilon

7-Eleven Doing More With Less Through Dynamic IT

Big Data The Big Story

Why more and more SAP customers are migrating to Solaris

Boomi Basics: Going Beyond Integration with APIs, Data Management and Workflow Automation

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

An Overview of the AWS Cloud Adoption Framework

Oracle s Platform for SAP Solutions

Introduction to IBM Insight for Oracle and SAP

IBM Big Data Summit 2012

BEYOND ERP : DELIVERING AMBITIOUS GLOBAL TRANSFORMATION AND GROWTH

Increased Informix Awareness Discover Informix microsite launched

WebSphere Cast Iron Overview IBM Corporation

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

2017 IBM Elastic Storage Server - Update

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Business Requirements Definitions

Delivering Data Warehousing as a Cloud Service

Transcription:

Managing Data Warehouse Growth in the New Era of Big Data Colin White President, BI Research December 5, 2012

Sponsor 2

Speakers Colin White President, BI Research Vineet Goel Product Manager, IBM InfoSphere Optim 3

Managing Data Warehouse Growth in the New Era of Big Data Colin White President, BI Research TDWI/IBM Webinar, December 2012

The Evolution of Digital Data First OLTP systems Early decision support products First commercial RDBMSs Early data warehousing Big data & advanced analytics 2012 Sabre 84,000 txs/day Increasing Data Volumes Sabre 60,000 txs/sec 25 TB EDW Copyright BI Research, 2012 5

Data Growth Choose an Analyst! Source: IDC Copyright BI Research, 2012 6 5

Data Growth and Big Data Big data technologies apply to all types of digital data not just multi-structured data Big is a relative term and is different for each organization and application What you do with big data and how you use it for business benefit should be the main consideration business analytics play a key role here Copyright BI Research, 2012 7 5

The Value of Data: IBM 2012 Study Copyright BI Research, 2012 8 5

The Value of Data: IBM 2012 Study Copyright BI Research, 2012 9 5

The Changing World of Business Analytics Advanced Analytics Improved analytic tools and techniques for statistical and predictive analytics New tools for exploring and visualizing new varieties of data Operational intelligence with embedded BI services and BI automation Big Data Management Analytic relational database systems that offer improved price/performance and libraries of analytic functions Non-relational systems such as Hadoop for handling new types of data Stream processing/cep systems for analyzing in-motion data In-memory computing for high performance Copyright BI Research, 2012 10 5

Advanced Analytics Example: SC Digest 2012 Advanced Analytics in Supply Chain, Dr. Michael Watson, Supply Chain Digest, November 2012 1 Copyright BI Research, 2012 11 1 5

Advanced Analytics Example: SC Digest 2012 1. Descriptive analytics using historical data to describe the business. This is usually associated with Business Intelligence (BI) or visibility systems. In supply chain, you use descriptive analytics to better understand your historical demand patterns, to understand how product flows through your supply chain, and to understand when a shipment might be late. 2. Predictive analytics using data to predict trends and patterns. This is commonly associated with statistics. In the supply chain, you use predictive analytics to forecast future demand or to forecast the price of fuel. 3. Prescriptive analytics using data to suggest the optimal solution. This is commonly associated with optimization. In the supply chain, you use prescriptive analytics to set your inventory levels, schedule your plants, or route your trucks. Advanced Analytics in Supply Chain, Dr. Michael Watson, Supply Chain Digest, November 2012 1 Copyright BI Research, 2012 12 2 5

What Then is Big Data? Represents analytic and data management solutions that could not previously be supported because of: Technology limitations poor performance, inadequate analytic capabilities, etc. High hardware and software costs Incomplete or limited data for generating the required solutions Set of overlapping technologies that enable customers to deploy analytic systems optimized to suite specific business needs and workloads Optimization may involve improving performance, reducing costs, enabling new types of data to be analyzed, etc. Copyright BI Research, 2012 13

Big Data and Data Life Cycle Management Data Management and Analytic Performance Capacity planning Managing data warehouse growth Analytic performance management and optimization Service level agreements Data Governance Security: user access, encryption, masking, etc. Quality: governed/ungoverned data Backup and recovery Archiving and retention: historical analysis, compliance Copyright BI Research, 2012 5 14

The Impact of Big Data on the Data Life Cycle Need fast time to value to quickly gain business benefits from big data Impractical to use traditional EDW approach for all analytic solutions Extend existing data warehousing environment to support big data and accommodate data growth Need high performance solutions for supporting big data analytic workloads One-size fits all data management is no longer viable Match technologies and costs to business needs and analytic workloads Need improved data governance to handle big data No longer practical to rigidly control and govern all forms of data implement different levels of governance based on security, compliance and quality needs Determine data archiving and policies based on the possible future need to analyze historical data and data compliance requirements 1 Copyright BI Research, 2012 15 5 5

The New Extended Data Warehouse IMPROVE EXTEND Copyright BI Research, 2012 16 5

The Quest for Performance: Optimized Systems Copyright BI Research, 2012 5 17

Optimized Systems: Hardware Directions Faster processors Multi-core processors Intelligent hardware 64-bit memory spaces Large-capacity disk drives Fast hard-disk and solid-state drives Hybrid storage configurations Scale-up/out parallel processing configurations Lower-cost hardware (blades, clusters) Reduced power and cooling requirements Packaged hardware/software appliances Copyright BI Research, 2012 1810

Managing Data Warehouse Growth: Storage Options Large-capacity hard-disk drives (HDD) More economical, less reliable, slower performance, e.g., SATA drives in white-box H/W Often short-stroked to improve performance High-performance hard-disk drives (HDD) More expensive, more reliable, better performance, e.g., enterprise SAS drives Solid-state drives (SSDs) High and consistent performance Better reliability and more energy efficient Distinguish between commodity SSDs and enterprise SSDs Dynamic RAM (DRAM) Best performance - eliminates I/O overheads Use by in-memory computing systems Copyright BI Research, 2012 1 19 9

Managing Data Warehouse Growth: Tiered Storage Hybrid (tiered) storage Employs a combination of different storage devices and options Data location is based on availability, usage, and performance requirements Some DBMSs support systemenabled data migration Table/partition vs block-level migration Warm data Hot data Highperformance RAM & SSDs High-performance HDDs Performance Tier Performance & capacity Tier Administrator-driven vs automatic migration Consider relationship to workload management and data archiving & retention strategies Cold data High-capacity HDDs Capacity & archive tier Copyright BI Research, 2012 2 20 0

Managing DW Growth: Archiving/Retention - 1 The price/performance of new software and hardware technologies enables more data to be kept online longer Data warehouse data is likely to be managed on multiple systems for performance, cost, or business reasons Policies are required to manage the archiving (for possible future analysis) and retention (for compliance) of data warehouse data These policies must be coordinated with corporate strategy especially in the area of compliance Copyright BI Research, 2012 5 21

Managing DW Growth: Archiving/Retention - 2 Operational, analytical and collaborative computing are becoming inextricably linked and this should be considered when developing retention policies Data security, quality, archiving and retention are related, but may need different levels of governance Data archiving and retention solutions may be home grown, use best-of-breed products, or employ an enterprise-level software approach Copyright BI Research, 2012 5 22

Summary Organizations are experiencing huge data growth Big data involves all digital data, but big is different for each organization and application Big data is a set of overlapping technologies that can provide huge business benefits but create some significant governance issues Big data involves the blending of governed and ungoverned data Impossible to govern all data and organizations need to implement flexible policies that: Allow easy but secure access to data Provide elastic capacity and good performance Efficiently manage data archiving and retention Copyright BI Research, 2012 5 23

Managing Data Warehouse Growth IBM InfoSphere Optim, Information Management Information Management 2012 IBM Corporation

Information Management IBM Big Data Strategy New analytic applications drive the requirements for a big data platform Integrate and manage the full variety, velocity and volume of data Apply advanced analytics to information in its native form Visualize all available data for ad-hoc analysis Need for trusted data drives requirements for an information integration and governance platform Ensure the highest quality information Master data into a single view Secure and Protect sensitive data to minimize risk and exposure Govern data throughout its lifecycle BI / Reporting IBM Big Data Platform Visualizati on & Application Developm ent Systems Managem ent Discovery Accelerators Hadoop System Analytic Applications Exploration / Visualization Functional App Industry App Stream Computing Predictive Analytics Content Analytics Data Warehous e Information Integration & Governance Information Integration & Governance 2012 IBM Corporation

Information Management Governing Data Throughout its Lifecycle Information Governance Core Disciplines Lifecycle Management Understand & Define Develop & Test Optimize & Archive Consolidate & Retire Discover where data resides Develop database structures & code Enhance performance Integrate into single data source Classify & define data and relationships Create & refresh test data Manage data growth Move only the needed information Define policies Validate test results Report & retrieve archived data Enable compliance with retention & e-discovery 2012 IBM Corporation

Information Management Data Warehouse Challenges Related to Data Growth Increasing Costs and Time to Market Poor Data Warehouse Performance Managing Risk & Compliance The impact of exponential data growth on infrastructure and operational costs to keep up with data volumes. Slow-performing business intelligence (BI) & analytics solutions due to unchecked data growth. The keep everything strategy impacts supporting the complex data retention and disposal compliance. 2012 IBM Corporation

Information Management Technical Challenges with High Volume Data Growth No policies and processes to manage multi-temperature data cost-effectively Transaction & data volume growth drive up infrastructure costs Multiple instances of production data backup, DR, training, test can compound data growth Large production instances affect cost & operations of backups, DR, non-prod Archival processes are complex requiring full business context for e-discovery 28 2012 IBM Corporation

Information Management What is Data Archiving? Manage data growth and improve performance by intelligently archiving historical data Universal Access to Application Data (ODBC/JDBC) Application Report Writer Requirements Archive, manage and retain application data according to business policies Maintain Referential Integrity Support tiered archiving to secondary database or file system Benefits Reduce hardware, storage and maintenance costs Improve application performance Support data retention or disposal requirements 2012 IBM Corporation

Information Management Tiered Storage for Information Lifecycle Management Current Data Active Historical Online Archive Offline Archive 1-2 years 3-4 years 5-6 years 7+ years Production Database Archive Restore Archive Reporting Database Non DBMS Retention Platform ATA File Server EMC Centera IBM RS550 HDS Offline Retention Platform CD Tape Optical Archive Definition s Compressed Archives Compressed Archives Compressed Archives 2012 IBM Corporation

Information Management Benefits of Archiving for the Data Warehouse Reduce Costs Reduce TCO by leveraging a tieredstorage archiving strategy -Reclaim space & defer hardware upgrades -Reduce operational and storage costs Improve Performance Increase performance by archiving out dormant data. -Faster analytical processing, ETL and reporting. -Faster backup, restores and maintenance tasks. Minimize Risk Support data retention Archive or legal hold Archive requirement. -Capture the precise sets of data required based on your audit or e- discovery. -Support defensible disposal of data 2012 IBM Corporation

Information Management About IBM InfoSphere Optim Solutions Discover Test Data Management Data Masking Manage Data Growth Application Retirement Partner-delivered Solutions Single, scalable, information lifecycle management solution provides a central point to deploy policies to extract, archive, subset, and protect application data records from creation to deletion 2012 IBM Corporation

Information Management Learn more Webpage: InfoSphere Optim Solution Sheet: InfoSphere Optim Solutions for the data warehouse Demo: InfoSphere Optim archiving Whitepaper: Control Application Data Before it Controls Your Business www.ibm.com/optim 2012 IBM Corporation

Information Management 2012 IBM Corporation

Questions? 35

Contacting Speakers If you have further questions or comments: Colin White, BI Research cwhite@bi-research.com Vineet Goel vineetg@us.ibm.com