From Data Deluge to Intelligent Data

Size: px
Start display at page:

Download "From Data Deluge to Intelligent Data"

Transcription

1 SAP Data Hub From Data Deluge to Intelligent Data Orchestrate Your Data for an Intelligent Enterprise

2 Data for Intelligence, Speed, and With Today, corporate data landscapes are growing increasingly diverse and distributed. Data volume is exploding with unstructured data from the Internet of Things and social sites. And companies are storing data in multiple locations on premise, in the cloud, in data warehouses, and on edge devices. The SAP Data Hub solution can help you unlock this treasure trove. Use the all-in-one data orchestration solution to create powerful, scalable data pipelines that connect data sources without moving the data. Enrich the data with context and rely on data quality, integrity, and security. Accomplish dramatically faster delivery of intelligent data that your business can use to optimize processes, respond to new information and events, or expand its portfolio of digital offerings. (See Figure 1.) Data landscape management Ingestion and connectivity Security and policy management Monitoring and deployment Data pipelining Modeling data pipelines and workflows Data enrichment Data preparation and quality Distributed data pipeline processing Data governance Data discovery Data profiling Metadata cataloging Figure 1: SAP Data Hub, an All-in-One Data Solution 2 / 17

3 By orchestrating and governing any type, variety, or volume of data across your entire distributed data landscape, SAP Data Hub rapidly delivers enriched, trustworthy, intelligent data to the right users with the right context at the right time and enables your enterprise to: Achieve excellence in customer engagement by responding to information and events with intelligent data and actions Optimize business operations and reduce risk and costs by gaining a better understanding of your data Expand your company s portfolio of digital offerings by translating internal and external data sources into business sources Orchestrate all your data across a distributed landscape process the data close to where it resides, move only the higher-value data, and retain a centralized view for governance. 3 / 17

4 of Intelligent Data with Data Pipelines By building information pipelines throughout and beyond the enterprise, you can work across silos with enhanced data agility. Rapidly create data pipelines and take action through scheduling their execution as part of powerful data workflows. Experience outstanding speed by processing your data across the data landscape, close to where it resides, minimizing the data movements. DATA REFINEMENT, ENRICHMENT, AND PROCESSING Readily create complex, multistep, reusable data pipelines to refine, augment, enrich, and process data at the source (see Figure 2). Combine different processing paradigms for disparate data sources from stream processing to structured data refinement, from image recognition to signal processing, and more to efficiently enrich data and extract actionable insights. Use and mix a variety of computation techniques such as OLAP, graph, time series, machine learning, and predictive analytics. Leverage existing skills and technologies such as Python, R, Apache Spark, TensorFlow, Go, and JavaScript, as well as SAP HANA software and the SAP Leonardo approach. 4 / 17

5 SAP DATA HUB Data Data landscape management Data pipelining Data governance Data sources Ingestion and connectivity Security and policy management Monitoring and deployment Modeling data pipelines and workflows Data enrichment Data preparation and quality Distributed data pipeline processing Figure 2: Enriching Data at the Source and Automating the Data Pipeline Data discovery Data profiling Metadata cataloging Data consumption 5 / 17

6 MODELING OF DATA PIPELINES AND WORKFLOWS Process data or reuse existing code and libraries through data pipelines consisting of several predefined and customizable operations. You can access connectors to messaging systems, to databases, and to systems that store and read data. Use process operators to execute any code, or operators for type conversion, and add functions for machine learning and image processing. As shown in Figure 3, the graphical tool for creating unified data workflows lets you orchestrate and execute the data pipelines in a given order via drag-and-drop functionality. You can create these reusable data workflows for execution through SAP Data Hub. Process data or reuse existing code and libraries through data pipelines consisting of several predefined and customizable operations. 6 / 17

7 The modeler provides hundreds of predefined operators in the following categories: Connectors to messaging systems (Apache Kafka, MQTT, NATS by Apcera, and Web Application Messaging Protocol [WAMP]) Connectors to store and read data (File by MuleSoft, Apache Hadoop Distributed File System [HDFS], Amazon S3, Google Cloud Storage [GCS], Network File System [NFS], and others) Operators for Web services protocols including REST and OData Connectors to databases (SAP HANA and the SAP Vora engine) Operators for data processing (JavaScript, Python, and Go programming languages) Operators for process execution (stateful and stateless) Operators for Spark and R software Operators for machine learning (TensorFlow and other machine learning frameworks) Operators for image processing (OpenCV library) Operators for digital signal processing Operators for type conversion Figure 3: One Modeling Experience for Data Pipelines, Workflows, and Transformations 7 / 17

8 with Metadata Policy Enforcing a multitude of corporate and regulatory data consumption policies is becoming a burden and risk for enterprise IT. SAP Data Hub enables consistent and proper data usage while helping ensure data quality, security, and governance across the distributed data landscape. It allows you to profile, view, and expose the data of all your connected systems. data sets fit together by interpreting information about data structures, value, and quality. And with the metadata catalog, different data values, attributes, and objects, such as in SAP HANA and the SAP Business Warehouse (SAP BW) application, can get semantically translated into proper schemas. METADATA MANAGEMENT AND CATALOGING The metadata catalog within SAP Data Hub, as shown in Figure 4, enables you to discover and manage your metadata assets across enterprise systems with disparate sources. It provides full insight into the systems that the data went through. And it enables the right users in your organization to discover, understand, and consume information about the data with the ability to synchronize and share it. Administrators can see where and how various Figure 4: Manage Metadata Assets Across Disparate Systems 8 / 17

9 DATA DISCOVERY AND PREPARATION SAP Data Hub provides an innovative approach to selfservice data preparation and also takes advantage of integration through the SAP Agile Data Preparation application. You can learn more about your data by accessing, profiling, transforming, enriching, and viewing it as you navigate distributed landscapes. With its intuitive interface, SAP Data Hub combines interactive exploration and the ability to graphically profile your data. With the data discover feature, you can gain insights with every click. Every transformation step defined by the user is highly structured. Typical transformations are projection (map and filter), data mask, and pivot, as well as flow-control transformations such as aggregation, join, union, and case. The data masking feature provides privacy when sensitive information needs to be protected. With SAP Data Hub, data discovery and preparation are intuitive and scalable across your enterprise. Learn more about your data by accessing, profiling, transforming, enriching, and viewing it as you navigate a connected system. 9 / 17

10 of the Data Landscape End-to-end data landscape management helps ensure you are using all your data to gain intelligence. Open and flexible architecture provided by SAP Data Hub helps with complete visibility and control across your diverse data landscape. You can be sure all data is under control, and you can trust the quality of the data used to make better decisions. With SAP Data Hub, you can access a variety of distributed data stores in your data landscapes. LANDSCAPE MANAGEMENT You can centrally manage the connectivity of distributed data using a visual and intuitive user interface. With SAP Data Hub, you can manage the software systems and connections in your landscape. This highly sophisticated landscape management is also essential when defining different security concepts for addressing various data sensitivities in a diverse, distributed landscape. Systems are stand-alone data sources in the distributed data landscape, and connections are the access points to a system through SAP Data Hub agents. You get smooth data connectivity to various data lakes, such as Hadoop, SAP BW, and SAP HANA, as well as Google Cloud Platform (Google Cloud Storage), S3, Microsoft Azure Data Lake (ADL), and Windows Azure Storage Blob (WASB). 10 / 17

11 LAUNCHPAD With the launchpad for SAP Data Hub, you can quickly access tools that can be applied to comprehensive scenarios. You can also embed related tools or create custom links to frequently used tools and pages. (See Figure 5.) Figure 5: One Central Entry Point to All Services and Applications 11 / 17

12 POLICY AND SECURITY MANAGEMENT In SAP Data Hub, policy and security management allows for creation, communication, and maintenance of policies and procedures within your enterprise. This feature helps in establishing security settings, policies for processes, modeling objects, identity control (users, groups, and roles), and security logging. Administrators have the ability to evaluate a policy directly from the cockpit or via the policy management tab, edit an existing resource for a policy, escape the keyword in a filter, and browse existing entities while creating a resource type. Administrators can optionally add a Uniform Resource Identifier (URI) to limit user access, enabling them to see only the connections in the specified URI path. With this feature, it is also possible to make sure that policies do not contain any unused resources or users and to run simulations to test policy decisions. Mitigate risks through creation, communication, and maintenance of policies and procedures within your enterprise. 12 / 17

13 MONITORING AND SCHEDULING You can schedule the execution of data pipelines in the dedicated scheduling area. Through the monitoring dashboard, you can keep track of the status of the data workflows and pipelines that you have scheduled for execution within the unified modeling view of SAP Data Hub. You can also suspend or resume them if necessary. Finally, by scheduling data pipelines in large cluster environments, you can handle batch-driven jobs and streaming jobs in a single environment. STREAMLINED DEPLOYMENT IN CLOUD ENVIRONMENTS AND ON PREMISE For a deployment, SAP Data Hub and all necessary components, including SAP HANA, are fully containerized and delivered as a Docker image. New modern architec- ture features separate computation and storage through decoupling of data processing (in Kubernetes) and data storage in distributed cloud stores. The solution can be deployed in a variety of Kubernetes-supported environments, such as in AWS, Azure, or Google Cloud Platform. This provides the flexibility to deploy in private clouds, managed clouds, and on-premise installations. Keep track of the status of data workflows and pipelines that have been scheduled for execution. 13 / 17

14 of Your Data SAP Data Hub empowers you to unlock the value of all your data. You can understand data from social sites, third-party data sources, devices, and sensors to identify ways to improve user engagement and products, and you can embed machine learning technology and predictive analytics into any use case. Data Streams SAP Data Hub App SAP HANA Data Lake Internet of Things Ingestion and Tackle the challenge of understanding vast quantities of raw data and events from disparate, semistructured sources. SAP Data Hub solves the challenge of capturing, structuring the unstructured, and then analyzing data from distributed heterogeneous environments spanning messaging systems, cloud storage systems, and enterprise applications. It enables enterprises to ingest, integrate, and process sensor data, correlating it to structured application data for full business context and supporting a variety of modern and advanced processing paradigms to get actionable insights. These insights can be used to drive and influence intelligent processes. It improves processes to reduce risk and drive profitability, as well as increases productivity gains for critical-use cases such as product quality improvements and preventive maintenance. Intelligent Data Warehouse SAP Analytics Cloud Expand beyond traditional, BW/4HANA SAP Data Hub monolithic data warehousing by SAP HANA Data Lake orchestrating data, on premise or in the cloud, across data warehouses, data lakes, data marts, and enterprise applications. Combine structured and unstructured data and process data where it resides with data pipeline applications. Enable enterprises to quickly acquire new data sources with previously siloed data from traditional data warehouses, data marts, enterprise applications, and Big Data stores. Combine all types of sources including structured and unstructured data in a governed way, and streamline a large variety of processing on them. For scenarios such as customer behavior analysis, SAP Data Hub brings the power of intelligent data. 14 / 17

15 Machine Learning Data Science SAP Data Hub Data Lake App App Data Science and Machine Learning Data Drive data ingestion and data preparation from any source of any kind, and intuitively infuse business processes with data science and predictive analytics. SAP Data Hub provides one unified tool to process machine learning models and advanced analytics algorithms on any mix of engines, whether from SAP (predictive analysis library on SAP HANA, SAP Leonardo Machine Learning capabilities, and so on) or from third-party providers (such as Python, R, Spark, and TensorFlow). It helps data scientists and data architects to streamline data science projects, with one single overarching tool delivering ingestion, preparation, and processing across all kinds of disparate data, different engines, and disparate processing paradigms. Enterprises can now industrialize advanced analytics and machine learning in a cost-effective, time-effective, and reliable way. Use cases such as smart energy management, transaction fraud detection, and customer churn can now employ all trusted data for better machine learning. with Data Cataloging SAP Analytics Cloud IoT A variety of disparate data assets, SAP Data Hub structured and unstructured, Apps Data Lake Data Warehouse distributed across different cloud and on-premise locations, are becoming more and more crucial to enterprises aiming to become data-driven businesses. But governance capabilities struggle to keep pace and are often still siloed, covering only partial views. This creates a challenge when sharing the intelligence with all the consumers of information. With SAP Data Hub, you can enable consistent and proper handling of data quality, integrity, and security across the enterprise. Profile, view, and expose the data of all connected systems. And you can architect data models more quickly in a highly visual environment. 15 / 17

16 SUMMARY SAP Data Hub is an all-in-one data orchestration solution that ingests, orchestrates, processes, and governs any type and volume of data across your entire distributed data landscape. And it delivers enriched, trustworthy data for the Intelligent Enterprise. Innovative data pipelines refine, process, and distill a wide variety of data while eliminating the need for mass data movement. You can gain complete visibility into all your human-, machine-, and application-generated data and securely consume and share relevant data. Deliver the right data to the right users with the right context at the right time with the trusted, open, and flexible data landscape management solution. OBJECTIVES Accelerate efficient delivery of intelligent data Gain a complete view of data assets and of their consumption across the enterprise Ingest, refine, and process disparate kinds of distributed data with one unified tool SOLUTION All-in-one data orchestration solution Automated, reusable data pipelines for scale and efficiency Central graphical user interface to monitor and distribute data Integration of data and processes for on-premise, cloud, and hybrid environments Unified metadata catalog for landscape-wide visibility of data assets End-to-end data landscape management for effective use of all data 16 / 17

17 BENEFITS Achieve excellence in customer engagements by responding to information and events with intelligent data and actions Optimize business operations to reduce risk and costs by gaining a better understanding of your data Expand your company s portfolio of digital offerings by translating internal and external data sources into business sources LEARN MORE To find out more, call your SAP representative today or visit us online. 17 / 17

18 Follow us Studio SAP 52802enUS (18/09) No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE s or its affiliated companies strategy and possible future developments, products, and/or platforms, directions, and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. See for additional trademark information and notices.