Analyze Big Data Faster and Store it Cheaper Dominick Huang CenterPoint Energy Russell Hull - SAP
ABOUT CENTERPOINT ENERGY, INC. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas Over 5000 square miles of electric transmission and distribution service area Assets total more than $22 billion Over 8,700 plus employees CNP & its predecessor companies in business for over 130 years Domestic Energy Delivery Operate, Serve, and Grow Smart Grid Enabled Twenty-Eight State Geography Over Five Million Metered Customers 2.3 million Smart Meters 4000 Miles of Transmission 47,000 Miles of Distribution Electric Transmission & Distribution Natural Gas Distribution Competitive Natural Gas Sales and Services CenterPoint Energy Proprietary and Confidential
AGENDA Key Drivers and Strategy of HANA Initiative Use Case Smart Meter Big Data Analytics Technology Overview POC Results Value and Comparison
KEY DRIVERS FOR HANA INITIATIVES SAP HANA as CNP strategic platform for critical transactional applications and Analytics Cost effective solution to manage and contain data storage growth Analytics platform simplification and consolidation to HANA Key technology enabler for future business solutions Maximize CNP investment on HANA license (40TB) Enable business resiliency implementation for CRM/ECC/BPC Leverage HANA in-memory capability for real time analytics
STRATEGY 3 YEAR HANA ROADMAP Technical Migration and Consolidation Migrate critical business applications (SAP and Mainframe) Consolidate Analytics solutions (BW, ISAS, ema, etc.) onto HANA HANA Platform Optimization Enhance performance of core business process and mass business functions Enable real-time reporting from the HANA (in-memory) database HANA Platform Innovation Innovative solutions to align with long-term business strategy and roadmap SIMPLE Finance, Predictive Asset Health Analytics, Situational Awareness, Internet of Things, Predictive Analytics for customer services, etc.
USE CASE SMART METER BIG DATA ANALYTICS
BUSINESS CHALLENGE 1+ PB of SmartMeter Data 2.3MM SmartMeters taking readings every 15 minutes creating 225MM Readings per day, or over 800 Billion Readings in a Year. Regulatory requirements require historical readings to be available for 10 years. Uncompressed Data Growth of 8TB per month and over 1PB in a 10 year period. Current DW technology is approaching End of Life Massive amounts of data stored in proprietary vendor solution, was hard to manage and has a significantly high total cost of ownership. Need a cost effective solution for today's analytics, regulatory requirements and preparation for future use cases. CenterPoint Energy Proprietary and Confidential
DATA TIER SOLUTION DATA VOLUME MANAGEMENT: MULTI TEMPERATURE DATA APPROACH hot Data is read and/or written frequently In memory No restrictions, all features available warm cold Non-Active Data Concept Infrequent access On disk, no need to keep in memory all the time No restrictions, all features available NLS Management for read-only data Sporadic access Not stored in HANA DB; stored in Near-line Storage Restricted to NLS capabilities Providing lower TCO by optimized data volume management
BUSINESS CASE CAPEX & OPEX SAVINGS 1400 1200 1000 800 600 400 280 380 Projected Data Capacity (TB) 480 580 680 780 880 980 1080 1180 Millions 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 $25 $20 $15 $10 $5 Projected Total Spend (Cumulative & Estimated) Capex Saving O&M Saving 200 0 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 $0 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 HANA O&M HANA Capital NZ O&M NZ Capital Projected Growth Business as usual Move to HANA/Hadoop Projected Savings 75% Capex and Opex saving Smart Meter Data grows more than 100TB/year, 1PB+ in 10 years CenterPoint Energy Proprietary and Confidential Information 9
SOLUTION BENEFITS Cost effective HOT+WARM+COLD data management strategy leveraging HANA data compression and data tiering technology Simplified Big Data ownership by combining SAP HANA, Dynamic Tiering and Hadoop into a single landscape. Single Database Experience. Query Execution utilizes SDA and automatically accesses data stored in HANA, Dynamic Tiering and Hadoop/Vora depending on location of data. Data Movement automated between storage tiers using the Database Lifecycle Manager (DLM). Foundation for advanced predictive analytics and future business capabilities Instant Real time Analytics via HANA 75% savings in storage cost compared to current solution. Data tiering technology (Dynamic Tiering, Hadoop) to manage data size and growth. Seamless integration with Hadoop integration allows for data scientist to use HANA toolset to access and manage Hadoop data Ability to charge business based on the data being stored and performance requirements
TECHNOLOGY REVIEW
SAP Big Data Platform CenterPoint Energy Proprietary and Confidential
NEW SMART METER ANALYTICS ARCHITECTURE Current Architecture Planned Architecture Application Business Objects / SAS / Custom Application Storage Tiers (Costs and Performance) Aggregation Aging Tier 0 (Memory) Speed Layer Tier 1 (SAN,..) Tier 2 (Hadoop) Batch Layer Netezza zos DLM 36TB HANA EDW 50TB Dynamic Tiering Extended Storage Hadoop (Vora) 750TB 1 2 3 1 2 3 13 months of data are stored in HANA for fast analytics 26 months of data are stored in DT (Sybase IQ) 10 years of meter data is stored in Hadoop. The plan is to use SAP HANA Vora to access the data
DYNAMIC TIERING SAP Dynamic Tiering is a warm store traditional disk based database system fully integrated into HANA. Based upon Sybase IQ: Column Store & Disk based Reduced TCO by lowering HANA memory footprint All HANA functions are available. Read/Write/Update Single Database experience: All DB access requests are managed through the HANA platform. Centralized operation control: All administration tasks are handled through the HANA interface.
SAP HANA DYNAMIC TIERING DISK-BACKED COLUMN STORE EXTENSION TO HANA FOR WARM DATA MANAGEMENT
WHAT IS APACHE HADOOP?
HADOOP TECHNICAL ARCHITECTURE HADOOP CLUSTER
SAP VORA - HANA/HADOOP INTEGRATION WHAT S INSIDE AND WHAT DOES IT DO? Drill Downs on HDFS Mashup API Enhancements Compiled Queries HANA-Spark Adapter Unified Landscape Open Programming SAP HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. Make Precision Decisions Democratize Data Access Simplify Big Data Ownership Any Hadoop Clusters
SAP DATA LIFECYCLE MANAGER (DLM)
SAP DATA TIERING ARCHITECTURE HANA Index Server Hadoop Spark Processing Engines Spark SQL In-Memory Stores SDA (Virtual Table) HANA Spark Controller DLM Reads Data from HANA Vora Upload Table into Vora Dynamic Tiering XS Engine HDFS Data Lifecycle Manager Extended Storage (DLM) Files Files Files DLM Writes Data DLM Writes Data to ORC File
POC REVIEW
POC OBJECTIVES Research and test SAP HANA Data Tiering technology, i.e. DLM (Data Life Cycle Management), Dynamic Tiering, Vora Hadoop Integration Evaluate Hadoop technology, understand Hadoop ecosystem and TCO Test SAP VORA - HANA and Hadoop integration technology Develop and validate solution options for several critical 2016 projects: Smart Meter Analytics, customer document repository for Mainframe Migration Build CNP in-house expertise in Hadoop and SAP HANA/Hadoop integration technology Identify use case and innovation opportunities at CNP
POC ENVIRONMENT AND TEST CASES POC Team CenterPointEnergy (Lead and Architects); SAP (CoE, PE, Global ITP); HP (Hardware); IBM(IBM Hadoop and Cloud) Environment Test Cases Hardware HP Lab: Hadoop 12 nodes cluster, CS500 HANA, HANA Dynamic Tiering Node IBM BigInsights Cloud Software SAP HANA SPS 10, DLM, Dynamic Tiering, VORA Hortonworks HDP Hadoop, RedHat Linux IBM Apache Hadoop with BigSQL Data Load - Extract 800GB, 7 Billion Smart Meter records from Netezza and ISAS, load data into HANA (Meter data scrambled to protect data security) DLM Use DLM tool to move data from HANA to Dynamic Tiering Extended Storage and Hadoop Run queries across all data tiers and measure performance Load, query and display 19 million PDFs of Customer Bills (Dummy PDF files used, no real customer data)
POC SUCCESS CRITERIA Data Tiering Move data among different tiers including HANA, DT and Hadoop Run SQL queries within and across data tiers Performance Measure response time for each data tier Data Compression evaluate compression ratio of HANA, DT and Hadoop SAP DLM Utilize the tool to move data from Hot to Warm and Cold tier Customer document storage Store and retrieve PDF documents with one second Comparison of storage costs: HANA, DT (Dynamic Tiering Extended Storage) and Hadoop
POC TEST RESULTS Hadoop HANA / DT / Spark/ Vora DLM HDP Customer Bill Store and Retrieval 40ms response time to search and display a document from 19 million PDFs HDP Batch data load via SQOOP into Hadoop 4 min 24s to load 2.5 million records (single thread);1 min 10s (10 threads) Data load from HANA to HDP Hadoop via VORA Total of 6.2GB ORC files stored in HDFS against original size of 172GB. Compression Rate: 9 (3 copies in HDFS) Run aggregation query across SAP HANA, HDP Hadoop & DT (~4 billion records): Response Time [s] 400 350 300 250 200 150 100 50 0 Query Response Time [s] 0.2 2.6 360 19 Move data from HANA to DT 289 million records moved from HANA to DT 670K records per minute Move data from HANA to Hadoop via VORA into HDFS 1.57 billion records moved from HANA to Hadoop 22 million records per minute
VALUE AND COMPARISON BETWEEN DATA TIERS
COMPARISON BETWEEN DATA TIERS Component Performance Cost Factor Volume Processing HANA $$$$ Up to 10s TBs (no technical limit) ACID compliant SQL, SQLscript, graph, time series, spatial, text, Dynamic Tiering or Sybase IQ $$ 100s of TB integrated in HANA Several PBs with Sybase IQ ACID compliant SQL Hadoop Spark/Vora $ 100s of PB or more ANSI SQL compliant Read-only SQL when used from HANA via SDA 15 times less expensive than T1 storage Transformations and Actions Performance can be improved significantly by increasing compute nodes and using SSD with higher cost Hadoop Vora in Memory $$ 100s of TB (depending on available memory in Hadoop cluster) Data loaded in memory to achieve better performance Read-only SQL when used from HANA via SDA
RECOMMENDED USE CASES SHORT TERM Component HANA Dynamic Tiering Hadoop - Spark Hadoop - Vora Recommended Use Case Managing up to several TBs of high value data Very high processing performance required SAP HANA native processing features (PAL,..) required OLTP with many fine-granular updates needed Managing up to several PBs of data at T2/T3 storage cost High performance for complex queries required Deep SAP HANA integration required (single database experience) Updates and deletes required Managing up to 100s PBs of data at T4 storage cost, 15 times less expensive than T1 storage Read-only sufficient (bulk load, no fine granular writes) Comparatively low-cost storage important Loose integration of administration and life-cycle management acceptable High OLAP query performance on Hadoop Additional query features (hierarchies)
THANK YOU Contact information: Dominick Huang Sr. Manager, Enterprise Technology & Architecture CenterPoint Energy Yong.huang@centerpointenergy.com Tel 713-207-6659 Russell Hull Chief Support Architect SAP America Russell.hull@sap.com
FOLLOW US Thank you for your time Follow us on at @ASUG365
APPENDIX
CNP HANA LANDSCAPE - ANALYTICS (BW + OW) Analytics (BW + OW) ES(NLS/DT/Hadoop) ES(NLS/DT/Ha doop) 0.5TB 0.5TB 0.5TB 0.5TB 0.5TB 0.5TB 0.5TB 0.5TB 0.5TB 0.25TB 0.5TB Existing blade New HP Node 2 TB Failover blade ES Extended Storage (NLS/DT/Hadoop) HIP(PRD) 36TB (Memory) HIQ(QA) HID(DEV) 1 Situation Awareness, MfM Testing & other Apps 4.5TBs HIS (SBX)
HADOOP ARCHITECTURE