FLINK IN ZALANDO S WORLD OF MICROSERVICES JAVIER LOPEZ MIHAIL VIERU

Similar documents
MapR Pentaho Business Solutions

Introduction to Stream Processing

Confidential

Real-time Streaming Applications on AWS Patterns and Use Cases

Architecting for Real- Time Big Data Analytics. Robert Winters

Operational Hadoop and the Lambda Architecture for Streaming Data

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

Microsoft Azure Essentials

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

20775: Performing Data Engineering on Microsoft HD Insight

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

Big Data Introduction

How In-Memory Computing can Maximize the Performance of Modern Payments

Building event-driven Microservices with Kafka Ecosystem

20775 Performing Data Engineering on Microsoft HD Insight

SCALING ZALANDO

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

EXPERIENCE EVERYTHING

Common Customer Use Cases in FSI

Building data-driven applications with SAP Data Hub and Amazon Web Services

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

DOAG Big Data Days 2018 DWH Modernization

20775A: Performing Data Engineering on Microsoft HD Insight

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

CASE STUDY Delivering Real Time Financial Transaction Monitoring

Configurable Data Pipelines Towards a Reference Architecture. E. U. Syed Philips Lighting Research Sunday, June 18, 2017

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

20775A: Performing Data Engineering on Microsoft HD Insight

How to Build Your Data Ecosystem with Tableau on AWS

Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale

Managing Data in Motion with the Connected Data Architecture

Your Top 5 Reasons Why You Should Choose SAP Data Hub INTERNAL

Secure information access is critical & more complex than ever

A Reference Architecture for Hybrid Integration. Peter Broadhurst Senior Technical Staff Member for IBM App Connect

ProSiebenSat1 s Digital Media Transformation AWS Enterprise Summit 2015

WELCOME TO. Cloud Data Services: The Art of the Possible

MapR Streams A global pub-sub event streaming system for big data and IoT

Integrating the Enterprise. How Business Leaders are Implementing Digital Integration

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ADVANCED ANALYTICS & IOT ARCHITECTURES

Maturing IoT solutions with Microsoft Azure. Glenn Colpaert Azure/IoT Domain

Alexander Klein. ETL meets Azure

REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.

RHAPSODY VISION & ROADMAP FIORA AU, TIM WHITTINGTON OCT 2017

Combine Microservices Framework for Flexible, Scalable, High Availability Big Data Analytics

The Need For Speed: Fast Data Development Trends Insights from over 2,400 developers on the impact of Data in Motion in the real world

When Big Data Meets Fast Data

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Data Integration for the Real-Time Enterprise

C3 Products + Services Overview

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

CI in the cloud. Case Talend & Nordcloud Ilja Summala, Group CTO, Nordcloud Anubhav Sharma, Senior Software Architect, Talend Deutschland

St Louis CMG Boris Zibitsker, PhD

Intro to Big Data and Hadoop

New Big Data Solutions and Opportunities for DB Workloads

Data Strategy: How to Handle the New Data Integration Challenges. Edgar de Groot

IoT ANALYTICS IN THE ENTERPRISE WITH FUNL

AWS Architecture Case Study: Real-Time Bidding. Tom Maddox, Solutions Architect

Modern Analytics Architecture

Insights-Driven Operations with SAP HANA and Cloudera Enterprise

DOWNTIME IS NOT AN OPTION

IBM Message Hub. James Bennett Offering Manager, IBM Cloud Integration IBM Corporation

Cask Data Application Platform (CDAP) Extensions

Integrating MATLAB Analytics into Enterprise Applications The MathWorks, Inc. 1

Data Analytics Use Cases, Platforms, Services. ITMM, March 5 th, 2018 Luca Canali, IT-DB

Introducing Infor Xi/Ming.le for M3

EBOOK: Cloudwick Powering the Digital Enterprise

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Processing Big Data with Pentaho. Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

Industrial IoT Solution Architecture Design From Connectivity to Data

CA UIM Log Analytics. Gain Full Stack Visibility With Contextual Log Insights. Mark Tukh Principal Presale Consultant CA NESS AT

Simplifying Your Modern Data Architecture Footprint

Business is being transformed by three trends

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Big data is hard. Top 3 Challenges To Adopting Big Data

What companies are looking for

Edge Analytics for IoT Device Intelligence

Databricks Cloud. A Primer

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Azure PaaS and SaaS Microsoft s two approaches to building IoT solutions

NOT ALL CLOUDS ARE CREATED EQUAL

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Dell IT Proven Dell IT s Big Data Journey with Big Possibilities

COURSE OUTLINE: Implementing a Data Warehouse with SQL Server Implementing a Data Warehouse with SQL Server 2014

Processing and Analyzing Streams. CDRs in Real Time

Stream Processing. Kai Wähner. as Game Changer for the Internet of

Device Data in IoT: Use it Early and Often. Josh Pederson, Director of Product Management Ayla Networks July 25, 2017

Financial Data Enterprise Platform

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

Using Mesos Schedulers with Amazon EC2 Container Service

GUIDE The Enterprise Buyer s Guide to Public Cloud Computing

Transforming Big Data to Business Benefits

Oracle Retail Data Model (ORDM) Overview

The Internet of Things Wind Turbine Predictive Analytics. Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure

Scaling up MATLAB Analytics with Kafka and Cloud Services

PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD

Taking the Pain Out of Developing and Deploying Streaming Applications. Craig Blitz, Senior Product Director Gerard Maas, Senior SW Engineer

Azure Data Factory Hybrid data integration, at global scale. Erika Harris Senior Program Manager AzureCAT

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Transcription:

FLINK IN ZALANDO S WORLD OF MICROSERVICES JAVIER LOPEZ MIHAIL VIERU 12-09-2016

AGENDA Zalando s Microservices Architecture Saiki - Data Integration and Distribution at Scale Flink in a Microservices World Stream Processing Use Cases: o o Business Process Monitoring Continuous ETL Future Work 2

ABOUT US Mihail Vieru Big Data Engineer, Business Intelligence Javier López Big Data Engineer, Business Intelligence 3

4

One of Europe's largest online fashion retailers 15 countries ~19 million active customers ~3 billion revenue 2015 1,500 brands 150,000+ products 11,000+ employees in Europe 5

ZALANDO TECHNOLOGY 1300+ TECHNOLOGISTS Rapidly growing international team http://tech.zalando.com 6

VINTAGE ARCHITECTURE

VINTAGE BUSINESS INTELLIGENCE Dev Business Logic Business Logic Business Logic Business Logic DBA Database Database Database Database Classical ETL process BI Data Warehouse (DWH) 8

VINTAGE BUSINESS INTELLIGENCE Exasol DWH Oracle 9

RADICAL AGILITY

RADICAL AGILITY AUTONOMY MASTERY PURPOSE 11

RADICAL AGILITY - AUTONOMY 12 Technologies Operations Teams

REST API REST API REST API REST API REST API SUPPORTING AUTONOMY: MICROSERVICES Business Logic Business Logic Business Logic Database Database Database Business Logic Business Logic Database Database 13

REST API SUPPORTING AUTONOMY: MICROSERVICES Team A Business Logic public Internet REST API Business Logic Team B Database Database Applications communicate using REST APIs Databases hidden behind the walls of AWS VPC 14

REST API SUPPORTING AUTONOMY: MICROSERVICES Team A Business Logic public Internet REST API Business Logic Team B Database Database Classical ETL process is impossible! 15

App A REST API App B REST API App C REST API App D REST API SUPPORTING AUTONOMY: MICROSERVICES Business Logic Business Logic Business Logic Business Logic Database Database Database Database Business Intelligence 16

SAIKI

SAIKI DATA PLATFORM App A App B App C App D BI Data Warehouse SAIKI 18

SAIKI DATA INTEGRATION & DISTRIBUTION App A App B App C App D BI Data Warehouse E.g. Forecast DB SAIKI REST API Exporter Stream Processing via Apache Flink Data Lake. AWS S3 19

Data sources Extraction SAIKI SUMMARY Data sources Connections Data sources Technologies Data Delivery B E F O R E Δ J D B C A F T E R REST 20

FLINK IN A MICROSERVICES WORLD

OPPORTUNITIES FOR NEXT GEN BI Cloud Computing - Distributed ETL - Scale Access to Real Time Data - All teams publish data to central event bus Hub for Data Teams - Data Lake provides distributed access and fine grained security - Data can be transformed (aggregated, joined, etc.) before delivering it to data teams Semi-Structured Data General-purpose data processing engines like Flink or Spark let you define own data types and functions. - Fabian Hueske, dataartisans 22

THE RIGHT FIT STREAM PROCESSING 23

THE RIGHT FIT STREAM PROCESSING ENGINE Candidates: Storm & Samza ruled out because of batch processing requirement 24

25

26 SPARK VS. FLINK DIFFERENCES Feature Apache Spark 1.5.2 Apache Flink 0.10.1 Processing mode micro-batching tuple at a time Temporal processing support processing time event time, ingestion time, processing time Latency seconds sub-second Back pressure handling manual configuration implicit, through system architecture State access full state scan for each microbatch value lookup by key Operator library neutral ++ (split, windowbycount..) Support neutral ++ (mailing list, direct contact & support from data Artisans)

APACHE FLINK true stream processing framework process events at a consistently high rate with low latency scalable great community and on-site support from Berlin/ Europe university graduates with Flink skills https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/ 27

FLINK ON AWS - OUR APPLIANCE MASTER ELB EC2 Docker Flink Shadow Master EC2 Docker Flink Master WORKERS ELB EC2 Docker Flink Worker EC2 Docker Flink Worker EC2 Docker Flink Worker 28

USE CASES

BUSINESS PROCESS MONITORING

BUSINESS PROCESS A business process is in its simplest form a chain of correlated events: start event completion event ORDER_CREATE ALL_PARCELS_SHIPPED D Business Events from the whole Zalando platform flow through Saiki => opportunity to process those streams in near real time 31

REAL-TIME BUSINESS PROCESS MONITORING Check if business processes in the Zalando platform work Analyze data on the fly: o Order velocities o Delivery velocities o Control SLAs of correlated events, e.g. parcel sent out after order 32

ARCHITECTURE BPM Operational Systems App A App B Nakadi Event Bus Kafka2Kafka Unified Log Stream Processing Elasticsearch App C Saiki BPM Cfg Service UI Alert Svc OAUTH PUBLIC INTERNET 33

34 HOW WE USE FLINK IN BPM 1000+ Event Types; 1 Event Type -> 1 Kafka topic Analyze processes with correlated event types (Join & Union) Enrich data based on business rules Sliding Windows (1min to 48hrs) for Platform Snapshots State for alert metadata Generation and processing of Complex Events (CEP lib)

STREAMING ETL

Extract Transform Load (ETL) Traditional ETL process: Batch processing No real time ETL tools Heavy processing on the storage side 36

WHAT CHANGED WITH RADICAL AGILITY? Data comes in a semi-structured format (JSON payload) Data is distributed in separate Kafka topics There would be peak times, meaning that the data flow will increase by several factors Data sources number increased by several factors 37

ARCHITECTURE STREAMING ETL Operational Systems App A App B App C Importer Oracle DWH Nakadi Event Bus Kafka2Kafka ` Unified Log Exporter Stream Processing Saiki Streaming ETL 38

HOW WE (WOULD) USE FLINK IN STREAMING ETL Transformation of complex payloads into simple ones for easier consumption in Oracle DWH Combine several topics based on Business Rules (Union, Join) Pre-Aggregate data to improve performance in the generation of reports (Windows, State) Data cleansing Data validation 39

FUTURE USE CASES

COMPLEX EVENT PROCESSING FOR BPM Cont. example business process: Multiple PARCEL_SHIPPED events per order Generate complex event ALL_PARCELS_SHIPPED, when all PARCEL_SHIPPED events received (CEP lib, State) 41

DEPLOYMENTS FROM OTHER BI TEAMS Flink Jobs from other BI Teams Requirements: manage and control deployments isolation of data flows 42 o prevent different jobs from writing to the same sink resource management in Flink o share cluster resources among concurrently running jobs StreamSQL would significantly lower the entry barrier

REPLACE KAFKA2KAFKA COMPONENT Python app extracts events from REST API Nakadi Event Bus writes them to our Kafka cluster Idea: Create Nakadi consumer/producer to enable stream processing with Flink to other internal users 43 (first POC done)

OTHER FUTURE TOPICS New use cases for Real Time Analytics/ BI o o Sales monitoring Price monitoring Fraud detection for payments (evaluation) Contact customer according to variable event pattern (evaluation) 44

CONCLUSION Flink proved to be the right fit for our current stream processing use cases. It enables us to build Zalando s Next Gen BI platform. https://tech.zalando.de/blog/?tags=saiki 45

THANK YOU