Building data-driven applications with SAP Data Hub and Amazon Web Services

Size: px
Start display at page:

Download "Building data-driven applications with SAP Data Hub and Amazon Web Services"

Transcription

1 Building data-driven applications with SAP Data Hub and Amazon Web Services Dr. Lars Dannecker, Steffen Geissinger September 18 th, 2018

2 Cross-department disconnect Cross-department disconnect Cross-department disconnect Enterprise data landscapes are growing increasingly complex I need more apps! What s the quality? Where s my new data? Why so slow? LANDSCAPE CHALLENGES GOVERNANCE Lack of security and visibility. Who changed the data? What was changed? Who is accessing it? Enterprise Apps ERP, CRM, HR BI and Visualization Need better dashboards Mobile Apps Cloud Apps???? Master Data Management Business IT LIMITED TOOLS Lack of enterprise readiness. High effort to productize complex data scenarios across data landscape MISSING LINK Between Big Data and Enterprise Data. Data is kept in silos across the enterprise. Cloud Storage Data Mart Data Mart Data Mart Data Lake EDW Data Mart Cloud Storage Data Lake EDW Data Lake R&D Manufacturing Sales & Marketing EDW Cloud Storage Data Lake Outside Partners Data Mart 2

3 Example: Advanced Fitness Tracker Analysis Heart rate, battery Big Data Intelligent Enterprise speed, distance # Sales figures Enterprise Data (e.g. ERP) Revenue, Margin Customer Groups Transactions Customer Master Data Data Machine Learning Data Mining Location, Details Sales Order Advanced Applications Intelligent Assistants 3

4 4

5 SAP Data Hub Integrate, orchestrate, process SAP Data Hub SAP ERP SAP HANA Data Discovery and Metadata Governance Data Hub Runtime Hadoop SAP S/4HANA Orchestration & Data Pipeline Cloud Storage SAP BW/4HANA Connectivity, Integration, Ingestion Machine Learning Enterprise Systems Distributed Data Systems Holistic Landscape Orchestration Flow-based data processing Enterprise and 3 rd party connectivity 5

6 Data Hub is the answer for enterprises modern data management challenges Topics Current challenges Data Hub s value proposition Data Visibility Data Quality Growing difficulty in managing and orchestrating large volume of data coming from SAP and non-sap system Constant need to improve data quality by cleansing and resolving inconsistencies Seamlessly orchestrates data from SAP and non-sap system and provides 360 degree view of ALL company data Offers one cross-landscape control center to monitor and improve data quality Data Innovation Data Cost Complexity in utilizing data from SAP and non-sap systems for Machine Learning training and IoT use cases Increasing storage and compute costs due to growing data volumes (organically or via M&A) Efficiently streams and processes data from all sources to enable new Machine Learning and IoT use cases Optimizes data costs by eliminating data duplications and data movement Data Compliance Enforcing multiple corporate and regulatory data policies is becoming a burden and risk for enterprise IT Manages all data compliance & governance policies of a company in one central location 6

7 SAP Data Hub Architectural overview Connected Systems SAP S/4HANA Metadata Management Self-service Data Preparation Pipeline Development Data Workflows Access Governance API Access SAP BW/4HANA Distributed Runtime Relational Time-Series Pipelines & Workflows Scripting (JS, Python) Built-in Connectors Metadata & Applications Scheduling Profiling & & Discovery SAP Data Services Graph SAP VORA Engines Document Templates Flow-based Applications Custom Operators SAP Data Hub System Management (based on SAP HANA) Multi-Tenancy User & Access Management Content Lifecycle Management Cluster Management Metadata Catalog Application Services Diagnostics Connectivity SAP LT Replication Server SAP HANA SAP Cloud Applications (API-driven) Native Integration with AWS Offerings Data Storages Cloud / On-Premise Object Stores Hadoop HDFS (optional) SAP Data Hub Adapter VORA Spark Extensions Open connectivity for 3 rd party & open source 7

8 SAP Data Hub data pipelines concept SAP Data Hub Pipelines Flow of Data Flow of Data Opera -tion (logic) Opera -tion (logic) Opera -tion (logic) Flow of Data Opera -tion (logic) Flow of Data Pipelines = Computation Graphs Execution environments for operators: Docker Containers as execution environments Operators Groups running in same Kubernetes Pod Groups with multiplicity parallel execution Sample operators: Amazon Kinesis Consumer, JavaScript Operator, S3 File Producer, HTTP Client 8

9 Easily design complex pipelines Pipeline Editor Editor Toolbar Navigation Pane Status Pane Navigation Pane: Use this pane to access operators, graphs, repository, and the types Pipeline (Graph) Editor: Use this editor to create a pipeline (graph( with one or more operators Editor Toolbar: The graph editor includes a toolbar, which you can use to perform operations on the graph, for example, to save and execute a graph Status Pane: You can use this pane to monitor the status of the graph execution, trace messages and view various logs 9

10 Building Data Driven Applications

11 Why should I even think about hybrid? Machine Learning Cloud Storage? ERP Data Customer Data Sales Data? High Value Enterprise Data?? Customer Data Center IoT/Stream Processing Advanced Analytics 11

12 AWS & Data Hub Jointly processing for improving your analysis Cloud Service SAP Data Hub Cloud Service Cloud Storage Cloud Storage Messaging Services Cloud Databases Machine Learning Connection Management Metadata Catalog Data Preparation Data Joins Data Pipelines and Transformations Messaging Services Cloud Databases Machine Learning Internet of Things Internet of Things SAP Systems Landscape Orchestration Metadata Governance Data Refinement Joint Processing Advanced flowbased applications SAP Systems SAP HANA SAP BW/4HANA ERP, S4/HANA Connect and oganize your entire system landscape to get a holistic overview. Get insights into the characteristics, profiles and data models of data residing in connected systems. Define the necessary steps to refine and shape the data with respect to your specific processing steps. Join data from different sources and continue processing them together. Conduct advanced transformations or apply machine learning by building powerful pipelines. SAP HANA SAP BW/4HANA ERP, S4/HANA SAP Applications SAP Cloud Platform Data Services Data Hub Storage Disk-Based Data Hub Storage EBS Volumes Data Hub Data Lake S3 Storage SAP Applications SAP Cloud Platform Data Services 3rd Party Cloud Source Managed Service On-Premise 3rd Party Cloud Target 12

13 AWS & SAP Data Hub Major use case scenarios Rapidly integrate and leverage new data sources Big Data warehouse use case Understand real-world performance Internet of Things use case Machine Learning and Predictive Analytics Data Science use case Acquire data from enterprise and cloud sources Combine structured and unstructured data Seamlessly move selected data sets across landscapes Combine streaming sources with static enterprise sources Support for high-velocity data ingestion and processing Scale-out ingestion processing and pipelining Apply machine learning to any data set Operationalize and automate Machine Learning processes Wide variety support 13

14 Big Data Warehouse

15 Modern landscapes: data perspective Enterprise data warehouse Analytical modeling Refining Insights out of (Big) Data Joins/unions de-normalization Matching/ duplicate check Structuring the unstructured Handling large volumes of data ETL or DWH are not the answer Big Data # Metadata extraction and generation Enrichment Cleansing (Re-)formatting Anonymization/masking Data formats and granularity Data streams and flexible structures Apply logic to the data, not data to the logic Integrated analytics Early insights on all levels Raw data Parsing Filtering Automation needed #123 $ %&? Search Data validation Data stream. 15

16 Vision: Intelligent Data Warehouse SAP / AWS Target Systems Close Integration between enterprise systems and cloud services High automation and minimal Modeling SAP Analytics Cloud Self Services Business Planning Predictive Analytics SAP BW/4HANA OLAP Data Modelling 3rd Party / Open APIs SAP Data Hub Meta Data Governance Data Lake as primary high volume and computation persistency Process & Orchestration Data Management Data Pipelines Scalable Storage and Data Processing capabilities in Cloud / On-Premise Massive Data Store High Volume Compute Advanced Data Proc. Data Lake on Amazon S3 Data Processing beyond OLAP with ML / Predictive Analytics etc. S/4 HANA C/4 HANA SAP ERP SAP HANA Amazon Aurora SAP / AWS Source Systems SAP Applications Cloud Storages (Amazon S3) AWS IoT Social Media Kinesis 16

17 Orchestration with SAP Data Hub Data Workflows Data replication to the cloud Orchestration via pipelines chaining multiple operations and execution engines Operators are managed and running as part of SAP Data Hub The heavy lifting / logic execution happens either: Internally In the operator container Externally In the connected system Design Time Runtime 17

18 Realizing the Big Data Warehouse for customers Amazon EKS SAP DATA HUB Meta Data Catalog SAP BW/4HANA on AWS Meta Data Repository Process Chains OpenODS View CompositeProvider SDA / SDI Amazon EKS, Amazon S3 Advanced DSO S/4HANA Amazon Redshift (via Data Services) Structured Data Analytics Model Integration SAP DATA HUB Runtime SAP Vora Refined Data Data Pipelines & Flows Big Data Processing SAP Data Services Data Flows Raw Data (S3 Data Lake).csv.parquet Ingestion ERP Databases Social Media Kinesis Amazon MQ 18

19 Demo 19

20 IoT Understanding Real-World Performance

21 SAP Data Hub - Use case scenario Internet of Things (IoT) SAP Data Hub Data Hub and AWS: Core capabilities Unite streaming data (sensors) with enterprise data (business metadata) Event-based execution and processing Scaling to 1,000s of pipelines in parallel at any time Automate, design, and run all data processes Examples Information from Internet-enabled devices Customer demographics Supply chain information Granular product usage information Apply the concept of Digital Twins to data streams, enabling customers to test outcomes and impacts of potential actions Enterprise IoT applications Data processing and orchestration Big Data lakes Ingestion and stream IoT gateway and services Sensors Amazon Web Services 21

22 Vision: End-to-End IoT Architecture with SAP Data Hub and AWS IoT IoT Edge IoT Platform IoT Enterprise Thing Thing Thing Thing Thing AWS IoT Edge AWS IoT Core Events Amazon Kinesis Spark SAP Data Hub (Pipeline Engine) Custom Code Rules Ingestion Pipeline / Orchestration / Streaming REFINE Op1 Op 2 Op 3 Op 4 Applications IoT Application IoT Application Analytic Application Analytic Application Business Solution (e.g. ERP, CRM) SAP Vora STRUCTURE Mass Storage SAP HANA Thing Thing Raw Data Relational Graph Time Series Document Amazon S3 HDFS (e.g. EMR) Thing Thing File Gateway Processing Engines Disk Persistency (Warm Data) Raw Data Store SDA In-Memory Engine (Hot Data / Aggregates) Files Data Sources Collect Publish / Suscribe Process (Stream / Batch) Store Analyze / Serve 22

23 Example: Fitness Tracker Analytics 23

24 Machine Learning Intelligent Data Processing

25 SAP Data Hub - Use case scenario Machine Learning and Predictive Analysis Customer Challenges Leveraging machine learning algorithms beyond the data science team Difficulty in monetizing and scaling out machine learning across an enterprise Amazon ML Tensorflow Sagemaker What we need to provide Apply machine learning and predictive algorithms to any data set Operationalize ML processes Insert machine learning and predictive processing to any scenarios SAP Data Hub SAP MLF Examples Insurance industry risk profiling Credit analysis and automated scoring models Machine failure prediction leading to automated preventative maintenance 25

26 Machine Learning with Amazon and SAP Data Hub SAP DATA HUB Meta Data Catalog SAP BW/4HANA Meta Data Repository Process Chains Query CompositeProvider Data Science Community OpenODS View Advanced DSO Data Pipelines & Flows Amazon ML Sagemaker SAP Data Hub Runtime Data Lake Amazon S3.csv.parquet How to provide SAP Data? Hwo to productize? ERP Databases Social Media Kinesis Amazon MQ 26

27 Example: Advanced analytics for the chemical industry in Germany Churn Analysis Train Model Pipelines - Read + prepare data from connected system - Fit and deploy ML-models in Repository Serve Model Pipelines - Consume ML-models from Repository - Expose predicition services via REST endpoint and WebUI Analytical Tools SALES_ORDERS Pipeline Engine Pipeline Repository Pipelines (Graphs) Operators Docker Files Models ML Models Pipeline Engine [Executor 1] Operator 1 Operator 2 Pipeline Engine [Executor N] Operator 3 Operator 4 AWS EKS 27

28 Demo 28

29 Stitching it Together to Build Data-Driven Applications

30 Designing data-driven applications with Data Hub and AWS Data-driven applications IoT Machine Learning Analytics / BW ABAP Integration SAP Data Hub Metadata Management Amazon MQ S3 Storages BW Process Chains DS Jobs HANA Flowgraphs Workflow Orchestration EMR / HDFS REST APIs Business Apps SCP SAP HANA Business Services Cloud Integration API HANA / Vora integration Processing & Pipelines Integration & Ingestion Distributed Datalake (In-Memory, Disk, S3) In-Memory Disk Cluster Object Store Connectors (open & native protocols) AWS Databases Amazon Analytics Machine Learning SAP API Business Hub SCI for process integration SAP Event Bus SAP Applications Amazon and distributed data sources 30

31 Why Amazon is the perfect partner for data-driven applications Managed Kubernetes Service Proven Data Lake Capabilities Advanced Analytics and Machine Learning Established Customer Base Simple deployment Operations and SLAs Easy auto scaling Certified K8s conformance Virtually unlimited storage High availability & durability Security & Compliance Huge variety of machine learning Well-established analytics Native Tensorflow integration Larges cloud provider Best established for enterprise customers Suited for high security projects (GovCloud) 31

32 Vision: Hyper-flexible scaling with Amazon Fargate Classic Fargate 32

33 Thank you. Contact information: Dr. Lars Dannecker Big Data Architect P&I Big Data Steffen Geissinger Big Data Architect P&I Big Data