Efficient Data Management with Near-Line Storage The Fast Track to SAP Knowledge Dr. Michael Hahne Product Manager, SAND Technology
AGENDA Information Lifecycle Management Near-Line Functionality Enterprise Data Warehousing Q & A SAP AG 2007, SAP Skills 2007 Conference / B5 / 2 The Fast Track to SAP Knowledge
AGENDA Information Lifecycle Management Near-Line Functionality Enterprise Data Warehousing Q & A SAP AG 2007, SAP Skills 2007 Conference / B5 / 3 The Fast Track to SAP Knowledge
What Companies are Facing Today Unprecedented growth in data Driven by business growth - more transactions, more customers, more everything Driven by need to keep new types of data IM files, RFID Driven by user demands for more in-depth and on-demand analysis/reporting Driven by regulatory mandates - e.g. SOX, Basel II compliance Driven by reluctance to purge data just in case Data Warehouse Management is challenged to meet SLA obligations Traditional solution: Either invest heavily in hardware and consulting, or exclude data from the warehouse Compromising analytical requirements arising from increasingly complex business processes Disturbing the decision-making process Disregarding regulatory obligations WORKLOAD COMPLEXITY Costs Performance Ability to meet SLA Obligations DATA GROWTH SAP AG 2007, SAP Skills 2007 Conference / B5 / 4
What Companies are Facing Today Unprecedented growth in data Driven by business growth -more transactions, more customers, more everything Driven by need to keep new types of data IM files, RFID Driven by user demands for more in-depth and on-demand analysis/reporting Driven by regulatory mandates -e.g. SOX, Basel II compliance Driven by reluctance to purge data just in case Data Warehouse Management is challenged to meet SLA obligations Traditional solution: Either invest heavily in hardware and consulting, or exclude data from the warehouse Compromising analytical requirements arising from increasingly complex business processes Disturbing the decision-making process Disregarding regulatory obligations WORKLOAD COMPLEXITY Costs Performance Ability to meet SLA Obligations DATA GROWTH SAP AG 2007, SAP Skills 2007 Conference / B5 / 5
Why not Just Add More Storage? Data volumes are in growing faster than the price/performance ratios of disk storage technology. Fast disks are still expensive Data stored in production environments requires failover and backup technology For every dollar a company spends on data storage devices, an estimated additional $5 to $10 is required to manage those devices over the lifetime of the equipment Total costs > $ 150.000 per TB per year More importantly, large volumes of data have adverse effects on system responsiveness, in areas such as: Data loading performance Performance of change runs, rollups, and so on Backup and recovery times Migration and upgrade times. SAP AG 2007, SAP Skills 2007 Conference / B5 / 6
Data Aging Strategy Information Life Cycle (ILM) architecture enables SAP BI Data Warehouse Managers to: - Keep a skinny, responsive relational database within SAP BI - Keep all their data accessible and usable over time - Satisfy analytic and legal requirements - Control their budget - Ensure system availability according SLA obligations Online Database Storage Near Line Storage Data Archiving Frequently read/updated data Infrequently read data Very rarely read data The Fast Track to SAP Knowledge SAP AG 2007, SAP Skills 2007 Conference / B5 / 7
Classic Archiv vs. Near-Line Access Frequency/Possibility Online Archive Online Archive Reload Online Near Line Storage Age of Data Archiving (SAP BW 3.X) ADK-based (Archive Development Kit) archiving solution for InfoCubes and ODS objects Cost-reduction due to storing data on alternative storage media Archived data must be reloaded into the SAP NetWeaver BI database for analysis purposes NLS (SAP NW 2004s BI) SAP NetWeaver BI analyses have direct access to NLS data Availability of historic data while reducing costs Reloading of data into the InfoCube or DataStore Object only necessary in exceptional cases SAP AG 2007, SAP Skills 2007 Conference / B5 / 8
Near-line Storage vs. BI Accelerator BI InfoCube NLS alternative storage types with direct access capabilities for reportingand loading extraction of non-frequently used, read-only InfoProvider data partitions extractedpartitions are deletedin RDBMS NLS storage and Online Storage together consistently reflect the BI data persistency of an InfoProvider NLS data is read-only NLS partitioned portions of an InfoProvider are write-protected Offline Archive BIA Indexing Archiving Near-line Storage BIA Replication of the BI Star Schema including master data DB volume not affected Roll-Up and Change Run possible after data loads optimized for fast BI Query access RDBMS Staging Access - very frequently frequently not frequently rarely SAP AG 2007, SAP Skills 2007 Conference / B5 / 9
Benefits of a Fundamental ILM Strategy for BI Increase Volume Manage and use even larger amounts of information more effectively Information available for any time frame for ad-hoc analyses and rebuilds Reduce Resource Consumption Reduction of hardware costs for hard drive hardware on the BW side Main memory and CPU as well as costs for system administration Increase Availability Quicker, simpler software- and release management in BW Reduced backup- and recovery times Intelligent data access Optimize Performance Speed up loading processes in SAP NetWeaver BI SAP NetWeaver BI query response times in the dialog The Fast Track to SAP Knowledge SAP AG 2007, SAP Skills 2007 Conference / B5 / 10
SAP NetWeaver 2004s BI Generic NLS Interface Analysis InfoCube/DataStore with NLS Data Management DB Interface Data Archiving Process / Data Transfer Process Near-Line Storage Adapter BI Database Near-Line Storage Partner Solution Data Flow Control Flow Robot. Tape Library Optical Libraries NAS or Cost-Effective Data Medium SAP AG 2007, SAP Skills 2007 Conference / B5 / 11
Consistency between nearline and online Analysis and Reporting operate on a combination of online- and near-line datasets. The consistency of the data is an absolute prerequisite. Archiving processes into different near-line storage levels have to fulfilltransactionalrequirementswith regardto maintaining consistency Archiving and deletion of data in the online database form a logical unit of work (LUW) Rollback mechanisms available for individual archiving steps. The archive gets the character of a database. The archive data are usually read only BEx - or Web - Reporting Online DB NLS Interface Archive SAP AG 2007, SAP Skills 2007 Conference / B5 / 12
SAND/DNA for SAP BI what is it? Software-based solution, fully integrated into the SAP NetWeaver BI 2004s infrastructure Data compression of at least 85%, frequently as high as 95% (depending on the data) Database independant, no special hardware required Does not require index building, but still allows any data field to be accessed within Data Transfer Processes (DTP s), BEx or any SAP BI-certified front-end tool Runs on most popular operating systems (Tru64, Solaris, AIX, HP-UX, Linux, Windows) Can run on the same server as SAP BI, or on a different server The Fast Track to SAP Knowledge Integrated with major archiving solutions to enable full Information Life Cycle Management like LiveLink from Opentext in accordance with SAP's recommended approach SAP AG 2007, SAP Skills 2007 Conference / B5 / 13
NLS Interface: Summary General aspects NLS fills the gap between online storage and offline storage with its data residing neither in the BI data base nor in a classic archive NLS data can be accessed directly for analyses and data load purposes (BI Queries, DTP s, DAP Reload Feature) NLS handling is provided for InfoCubes and DataStore Objects and processed using Data Archiving Processes (DAP) NLS storage and Online Storage together consistently reflect the BI data persistency of an InfoProvider NLS data is read-only NLS partitioned portions of an InfoProvider are write-protected Vendor specific aspects NLS data can be compressed NLS data can be indexed NLS data can reside on a separate database or file system SAP AG 2007, SAP Skills 2007 Conference / B5 / 14
AGENDA Information Lifecycle Management Near-Line Functionality Enterprise Data Warehousing Q & A SAP AG 2007, SAP Skills 2007 Conference / B5 / 15 The Fast Track to SAP Knowledge
Near-Line Storage of BW Info-Cubes C P T sold units revenue... 20 1000 Fact table InfoCube 50 2500. C SID-Cust SID-Group SID-Branch SID- Corp. 522 170 Dimension table Time Dimension table Product Dimension table Customer SID-Cust. Cust. SID-Region Cust. Text SID-Group Group 522 3417226 23 3417226 Lampen Müller 170 12769 SID-Region Region SID table for attributes Text tables for (language dependent) captions Region Text SID table for attributes 23 S5 SID table for attributes S5 Süd 5/Frankfurt Master data SAP AG 2007, SAP Skills 2007 Conference / B5 / 16
Compression of Info-Cubes E fact table Produkt Periode Umsatz F fact table without compression P1 P2 P3 P4 P5 2003-04 2003-04 2003-04 2003-04 2003-04 70 60 110 80 80 Request 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Produkt P1 P2 P3 P4 P5 P1 P2 P3 P4 P5 P1 P2 P3 P4 P5 Periode 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 2003-04 Umsatz 40 20 50 40 30 30 40 60 40 50 40 60 20 60 80 Request 3 3 3 3 3 F fact table Produkt P1 P2 P3 P4 P5 compression of all Requests up to Request 2 Periode 2003-04 2003-04 2003-04 2003-04 2003-04 Umsatz 40 60 20 60 80 SAP AG 2007, SAP Skills 2007 Conference / B5 / 17
Tutorial Example Cube(s) ZEDUnnI SAP AG 2007, SAP Skills 2007 Conference / B5 / 18
Defining a Data Archiving Process (RSDAP) SAP AG 2007, SAP Skills 2007 Conference / B5 / 19
Specifying the Selection Criteria SAP AG 2007, SAP Skills 2007 Conference / B5 / 20
Creating New Archive Requests via Workbench SAP AG 2007, SAP Skills 2007 Conference / B5 / 21
Relative Time Calculation SAP AG 2007, SAP Skills 2007 Conference / B5 / 22
Archiving Tab in Administration Screen SAP AG 2007, SAP Skills 2007 Conference / B5 / 23
Logical Unit of Work: Continue Archive Request SAP AG 2007, SAP Skills 2007 Conference / B5 / 24
Request Meta Data SAP AG 2007, SAP Skills 2007 Conference / B5 / 25
Reporting Aspects (Adjoint Provider) Business Explorer Suite (BEx) Transparent Access No Access BI Adjoint InfoProvider InfoProvider BIA Engine NLS Engine NearlineProvider Offline Archive RDBMS SAP AG 2007, SAP Skills 2007 Conference / B5 / 26
Listcube of Adjoint Near-Line Provider SAP AG 2007, SAP Skills 2007 Conference / B5 / 27
Parallel Archive in Process Chain SAP AG 2007, SAP Skills 2007 Conference / B5 / 28
Relative Time Calculation SAP AG 2007, SAP Skills 2007 Conference / B5 / 29
Restricting the Archiving Request to COMPCODE 1000 SAP AG 2007, SAP Skills 2007 Conference / B5 / 30
Parallel Execution of Archiving Requests SAP AG 2007, SAP Skills 2007 Conference / B5 / 31
Request Status after Process Chain Execution SAP AG 2007, SAP Skills 2007 Conference / B5 / 32
Merged Deletion Process SAP AG 2007, SAP Skills 2007 Conference / B5 / 33
Deletion Phase started parallel one Delete Job SAP AG 2007, SAP Skills 2007 Conference / B5 / 34
Final Status after Complete Archiving Requests Execution SAP AG 2007, SAP Skills 2007 Conference / B5 / 35
Reloading Requests SAP AG 2007, SAP Skills 2007 Conference / B5 / 36
Reload is a Logical Unit of Work SAP AG 2007, SAP Skills 2007 Conference / B5 / 37
Request Status after Reload SAP AG 2007, SAP Skills 2007 Conference / B5 / 38
Reporting Aspects (Query) Business Explorer Suite (BEx) Transparent Access No Access BI Adjoint InfoProvider InfoProvider BIA Engine NLS Engine NearlineProvider Offline Archive RDBMS SAP AG 2007, SAP Skills 2007 Conference / B5 / 39
Exemplary Query SAP AG 2007, SAP Skills 2007 Conference / B5 / 40
Transparent Query Access SAP AG 2007, SAP Skills 2007 Conference / B5 / 41
Query Result with and without NLS Flag set SAP AG 2007, SAP Skills 2007 Conference / B5 / 42
Logical Partitioning of Info-Cubes Near-Line Concept MP often 2006 Aggregates 2005 seldom 2004 2003 not yet 2002 Cubes Restore seldom relevant NLS SAP AG 2007, SAP Skills 2007 Conference / B5 / 43
Package Based Communication in the Near-Line Interface Selection Profile determines slice to be stored Near-Line online nearline e.g. CALYEAR <= 1990 Semantic Group determines granular Data Objects CC1000 19900101 Data package CC1000 19900102 CC1000 19900103 CC1000 19900104 e.g. per unique value combination of CALDAY and COMPCODE Data Objects are collected to Data packages Data package and are transferred to the Near-Line System Near-Line System SAP AG 2007, SAP Skills 2007 Conference / B5 / 44
Adjusting the Communication Packaging in the DAP SAP AG 2007, SAP Skills 2007 Conference / B5 / 45
Examine the Communication via the Job Log SAP AG 2007, SAP Skills 2007 Conference / B5 / 46
AGENDA Information Lifecycle Management Near-Line Functionality Enterprise Data Warehousing Q & A SAP AG 2007, SAP Skills 2007 Conference / B5 / 47 The Fast Track to SAP Knowledge
Lesson learned : Nearline concentrates on Detailed Data BI Accelerator Aggregates Reporting Cubes Relieving SAP BI from detailed data Compressed bymore than 85% with SAND/DNA Used as a Corporate Memory Details in its pure form Infrequently used detailed data Just-in-Case data Aged and historical data Legacy data Propagation Transformation Data Archiving Process (DAP) Acquisition Layer 50%-70% of overall data volume are in granular layers Efficient Corporate Memory SAP AG 2007, SAP Skills 2007 Conference / B5 / 48
Usage of the corporate memory BI Accelerator Aggregates Greater Flexibility in Responding to New Analytical Requirements deriving new InfoCubes or DSO s Reporting Cubes building new KPI s based on historical data Data Transfer Process (DTP) & Look Up API Propagation Transformation Acquisition Layer Efficient Corporate Memory SAP AG 2007, SAP Skills 2007 Conference / B5 / 49
Next generation EDW Layer: Anticipating the Unknown storing detailed data according business and legal requirements... and not according data management or costs constraints... SAP AG 2007, SAP Skills 2007 Conference / B5 / 50
Different Archive Frequency per Layer Data Marts e.g. monthly or yearly Data Integration Layer e.g. weekly or monthly Data Acquisition Layer daily or more often EDW SAP AG 2007, SAP Skills 2007 Conference / B5 / 51
DTP Access to Corporate Memory SAP AG 2007, SAP Skills 2007 Conference / B5 / 52
Full Transformation Flexibility with Corporate Memory SAP AG 2007, SAP Skills 2007 Conference / B5 / 53
Look-Ups in the Data Flow Architecture Reporting Layer Look-Ups are often used e.g. to extend with derived attributes Look up of historical data in Update Rules Data Warehouse Layer History Objects Staging ODS Acquisition Layer Adhoc reporting, Analysis Process Designer Look- Up Nearline - Object SAP AG 2007, SAP Skills 2007 Conference / B5 / 54
Usage of Look-Up API in Analysis Processes Analysis Process Data Access API Single Point of access to all data archived and non-archived DB Interface NLS - API SAP AG 2007, SAP Skills 2007 Conference / B5 / 55
AGENDA Information Lifecycle Management Near-Line Functionality Enterprise Data Warehousing Q & A SAP AG 2007, SAP Skills 2007 Conference / B5 / 56 The Fast Track to SAP Knowledge