ICT, STREP FERARI ICT-FP Flexible Event processing for big data architectures. Collaborative Project D 6.3

Size: px
Start display at page:

Download "ICT, STREP FERARI ICT-FP Flexible Event processing for big data architectures. Collaborative Project D 6.3"

Transcription

1 ICT, STREP FERARI ICT-FP Flexible Event processing for big data architectures Collaborative Project D 6.3 Project Presentation Contractual Date of Delivery: Actual Date of Delivery: Author(s): Institution: Workpackage: Security: Nature: Michael Mock Poslovna Inteligencija d.o.o. WP6 PU O Total number of pages: 37

2 Project coordinator name: Michael Mock Project coordinator organisation name: Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) Schloss Birlinghoven, Sankt Augustin, Germany URL: Revision: 1 Abstract: This document is the FERARI deliverable of WP6 for the first review period ( ). The project presentation gives an overall overview of the FERARI project including the goals of the project, project partners and workpackage organization. Revision history Administration Status Project acronym: FERARI ID: ICT-FP Document identifier: D 6.3 Project Presentation ( ) Leading Partner: Poslovna Inteligencija d.o.o. Report version: 1 Report preparation date: Classification: PU Nature: OTHER Author(s) and contributors: Michael Mock (FHG) Status: - Plan - Draft - Working - Final x Submitted Copyright This report is FERARI Consortium Its duplication is restricted to the personal use within the consortium and the European Commission.

3 Flexible Event processing for big data architectures (FERARI)

4 Introduction 2

5 FERARI A FP7 EC - ICT project Grant Agreement No STREP Specific Targeted Research Project Grown out of FP7 basic research project LIFT (FET Open) FERARI was ranked 6th of 33 proposals within objective 4.2 Scalable Data Analysis February 2014 January 2017, Funding: 2.95 Mio. EUR 3

6 FERARI - Consortium Fraunhofer IAIS (FHG) Technion (Technion) + Haifa University IBM (IBM) Poslovna Inteligencija (PI) Technical University of Crete (TUC) T-Hrvatski Telekom (HT) 4

7 Fraunhofer IAIS: Intelligent Analysis and Information Systems From sensor data to business intelligence, from media analysis to visual information systems: Our research allows companies to do more with data 270 people: scientists, project engineers, technical and administrative staff Located on Fraunhofer Campus Schloss Birlinghoven/Sankt Augustin near Bonn Joint research groups and cooperation with Institute Director: Prof. Dr. Stefan Wrobel Lead researcher: Dr. Michael Mock 5

8 Technical University of Crete Founded in 1977 in Chania, Crete 120 faculty members, ~175 adjunct faculty and lab personnel 2900 undergraduate and 550 graduate students Around 200 research programs, total budget 20.5 million ECE department: 25 faculty, ~200 undergrad students/year Research organized in 10 research laboratories SoftNet Lab (headed by Prof. Garofalakis): Focus on Big Data Analytics, Data Streams, Cloud Computing Lead researcher: Prof. Minos Garofalakis 6

9 TECHNION Israel Institute for Technology and University of Haifa The Technion-Israel Institute of Technology is a major source of the innovation and brainpower that drives the Israeli economy, and a key to Israel s reputation as the world s Start-Up Nation. Its three Nobel Prize winners exemplify academic excellence. Located in Haifa, oldest University in Israel (1912) 600 Faculty Members (3 Nobel Laureates) Computer Science: 50 faculty members, 1500 Students Lead researcher: Prof. Assaf Schuster, head of Technion Computer Engineering Center, focus on Distributed and Scalable Data Mining, Monitoring Distributed Data Streams, Big Data Technologies and Analytics and Dr. Daniel Keren, Department of Computer Science at Haifa University 7

10 IBM Research Haifa IBM Research is the innovation branch of IBM, the motto of IBM Research is the world is our lab 350 people: scientists, software engineers, subject matter experts Located in Haifa, Israel on the campus of Haifa University The largest IBM Research Lab outside the USA Lead researcher: Fabiana Fournier 8

11 T-Hrvatski Telekom: Communication, Information & Entertainment, Always & Everywhere T-HT Group is the leading provider of telecommunications services in Croatia and the sole company to offer the full range of these services: it combines the services of fixed and mobile telephony, data transmission, Internet and international communications T-HT s strategy: GROW - COMPETE TRANSFORM Key figures for 2012: T-HT - to be the online company and to power the online society and digital economy in Croatia and the Region Revenues: 991 mio EUR EBITDA margin: 45,3% 5780 employees Lead representative: Maja Vekić-Vedrina 9

12 Poslovna inteligencija: Leader in business intelligence We provide our customers with the best possible service in strategic consultancy and in implementation of intelligent information systems for decision support, thereby helping them to create new values and identify new business opportunities. 90 employees - 90% project engineers, technical and business consultant, 10% sales and administration HQ in Zagreb, Croatia, offices in UK, Slovenia, Serbia, Bosnia and Herzegovina and Montenegro Extensive experience in Telecommunication industry and in R&D Big Data projects Lead representative: Dražen Oreščanin 10

13 Motivation A number of recent technological developments have started to change our world forever: the rise of the internet the ever growing amount of activities in social networks the widespread adoption of smart phones and other mobile devices the instrumentation of the world with sensors. This is accompanied by dropping prices for computers, networks, and storage 11

14 Objectives Provide support for large scale services by making the sensor layer a first class citizen in Big Data architectures. Provide support for Complex Event Processing technology for business users in Big Data architectures. Provide support for integrating machine learning tasks in the architecture. Provide support for flexible and adaptive analytics workflows. Exemplify the potential of the new architecture in the telecommunication and the cloud domain. 12

15 Use cases Monitoring a smart energy grid. Analysing the traffic state of a large city using car-to-car communication. Monitoring the quality of a telecommunication network. Detecting latent failures in a large cloud of thousands of machines. Inspecting potentially fraudulent credit card transactions in real-time and blocking these transactions when necessary. 13

16 Application Scenarios Mobile Phone Fraud Detection Detecting mobile phone fraud by analysing usage patterns Reliably detect mobile phone fraud Avoid financial losses due to fraud Scalability to millions of events /sec (for simple filtering), for more complex analysis less (depending on complexity of task) Cloud Health Monitoring Cloud data centre activity log monitoring Possibility to replace time-interval by event- based maintenance Avoiding service down-time 14

17 Negotiation Question: Data Size Quantity of data Average monthly number rated call details records is > 650 mio and total monthly quantity of data is > 300 GB. When it comes to raw call details, monthly quantities are significantly higher: number of records > 5500 mio and total size of data >10 TB. Cloud services are one of the recently implemented services in Hrvatski Telekom. Number of cloud servers and customers using cloud services is still fairly low but numbers are rapidly increasing. Currently, the cloud consists of 6 machines which are producing a total amount of data of >40 GB per month. During the course of this project, we expect that the cloud might double its current size. 15

18 FERARI success criteria The project s success will be rigorously measured by the following validation criteria: Communication reduction with respect to global/state-of-the art solutions. A second quantitative validation criterion is processing time relative to the size of the data. A third criterion is for monitoring applications the number of false alarms Number of domains to which the approach can be deployed. A key to this is the variety aspect enabled by Distributed Complex Event Processing. Flexibility. The system will be designed such that it can adapt to new, unforeseen circumstances and can be easily consumable. 16

19 Workpackages 17

20 Work Plan Phase 1 (M1 M12) - use case definition - component definition - architecture definition Phase 2 (M13 M24) - Component refinement - First use case prototype implementation - First Architecture implementation Phase 3 (M25 M34) will demonstrate and evaluate the impact of the methods developed in this project 18

21 Workpackage Structure WP1 Use Cases WP4 Flexible Event Processing WP3 Communication Efficient, Low Latency Methods WP5 Robust Distributed Stream Monitoring WP2 - Architecture * WPs 6 and 7, which will interact with all WPs for dissemination and management tasks have been left out to increase readability. The general flow of dependencies is top-down from the use cases to the architecture and methodological work. Architecture and methods interact iteratively, since there are many technical and methodological dependencies. 19

22 FERARI - Workpackages provides Software Platform Complex event processing Communication efficient processing Stream processing Prototype 20

23 WP1: Application Scenarios, Test bed, Prototype Objectives: Selecting and defining the application scenarios fraud mining and cloud health monitoring Definition of testing & evaluation criteria for the end users at HT Setting up of a test bed both at HT and at the project partner s local sites Implementation and evaluation of scenarios in a prototype to demonstrate the advantage of FERARI with respect to the state of the art as well as to demonstrate its business value 21

24 WP2: Big Data Streaming Architecture & Technology Integration Objectives: Define a Big Data architecture that makes the sensor layer a first class citizen of the architecture, Define a data and control flow that can implement a push based approach, so that processing can be partially done in situ, Provide methods for robust distributed stream processing including online machine learning Implement the architecture in as software platform (open source). 22

25 Architecture Diagram of FERARI Event processing deals with these functions: get events from sources (event producers). route these events, filter them, normalize or otherwise transform them, aggregate them, detect patterns over multiple events (event processing agents). transfer events as alerts to a human or as a trigger to an autonomous adaptation system (event consumers). 23

26 WP2: Big Data Streaming Architecture & Technology Integration - TASKS The tangible output of WP2 will be the definition of the software big-data architecture allowing for the integration of components for complex event processing, in-situ processing and robust distributed stream processing including online machine learning. In addition, the architecture will be provided as software platform. 24

27 Interdependencies between WP1 & WP2 WP2: Software Platform Open source General purpose for communication efficient big-data stream analysis alg. Flexible event processing Components as libraries Interfaces to plugin concrete algorithms (learning, monitoring) In stream learning CEP Language Software Platform Plugin concrete algorithms Prototype 25

28 WP3: Communication Efficient, Low-Latency Methods Objectives: develop in-situ processing methods that go beyond current methods develop new algorithms that are able to efficiently detect granular events identify and explore the right level of in-situ processing for scalability issues 26

29 In-Situ Processing (LIFT) Coordinator Monitors Global Treshold Resolution protocol (after violation) (example: all nodes of a cloud work in healthy state) Sensors monitor local Safe-Zone in - situe Global Condition/ Reference Point nodes Alarm message only if local Safe Zone is violated Local Condition Safe - Zone 27

30 WP4: Flexible Event Processing Objectives: develop a Complex Event Processing model and methodology suitable for specification, implementation, and maintenance of event-driven applications Providing semantics for specifying event patterns Providing a end-user consumable framework for flexibly specifying event processing systems Providing modules for generation of an event processing network implementation and optimization plan that allows distributed in situ monitoring of complex event patterns 28

31 WP5: Robust Distributed Stream Monitoring Objectives: develop methods for robust distributed stream monitoring exploit online machine learning methods to adapt the FERARI data/control flow to unforeseen circumstances Provide support for integrating machine learning into the architecture. Accounting for uncertainty in the architecture 29

32 Simple LIFT Example Mobility Monitoring using stationary sensors Each sensor computes a (linear counting) sketch of bluetooth addresses in sensor rage Sketch is a bit-array of fixed length Provide set of mobility mining primitives count distinct union intersection sk(r Si ) sk(r Sj ) Coordinator S i S j 30

33 WP6: Dissemination & Exploitation Objectives: Disseminating the FERARI theoretic framework to the scientific community of data mining and distributed systems. Outlining the methodological and technical superiority of the proposed solution compared to other approaches to distributed monitoring Dissemination to high-profile early adaptors within the scope of the application scenarios 31

34 WP7: Coordination Objectives: Establishment of a strong project management scheme Successful achievement of the project objectives on time and within budget Generation of synergies amongst the project members Continuous monitoring of the project s progress and timely initiation of corrective actions (if needed) Coordination of the continuous process aiming to transfer the knowledge generated to the relevant scientific communities 32

35 List of Deliverables Deliver able No Deliverable name WP No. Nature Dissemination level Due date Deliver able No Deliverable name WP No. Nature Dissemination level Due date 1.1 Application Scenario Description and Requirement 1 R PU M12 Analysis 1.2 Final Application Scenarios and Description 1 R PU M24 of Test Environment 1.3 Application Scenario & Prototype Report 1 R PU M Architecture 2 R PU M12 definition 2.2 System 2 R PU M24 Prototype 2.3 Final Prototype 2 R PU M Requirements and state of the art overview on in situ methods 3.2 Development of algorithms based on in-situ, low-latency Methods 3.3 Implementation and evaluation of in-situ, low latency Algorithms 4.1 Requirements and state of the art overview on Flexible Event Processing 4.2 Goal driven model and methodology for specification of event processing Applications 4.3 Automatic generation of annotated event Processing network from the goal-driven Model 3 R PU M12 3 R PU M24 3 R PU M36 4 R PU M12 4 R PU M24 4 R PU M Requirements and state of the Art overview on Robust Stream Monitoring 5.2 Algorithms for Robust Distributed Stream Monitoring and Supporting Data Integrity 5.3 Implementation of Algorithms for Robust Distributed Stream Monitoring and Supporting data Integrity 6 R PU M12 6 R PU M24 6 R PU M Project Fact Sheet 6 O PU M3 6.2 Project Web Site 6 O PU M3 6.3 Project Presentation 6 O PU M3 6.4 Project Workshop, Seminar and Training Course 6 R PU M First Draft of Exploitation Plan 6 R CO M Exploitation and Dissemination Plan 6 R CO M Quality Assurance Plan 7 R PU M st Annual Project Report 7 R CO M nd Annual Project Report 7 R CO M Final Project Report 7 R CO M36 Each WP-Leader is responsible for the deliverables of his or her WP more details in the 33

36 Summary The goal of the FERARI project is to pave the way for efficient, real-time Big Data technologies of the future. It will enable business users to express complex analytics tasks through a high-level declarative language that supports distributed Complex Event Processing and sophisticated machine learning operators as an integral part of the system architecture. Effective, real-time execution at scale will be achieved by making the sensor layer a first-class citizen in distributed streaming architectures and leveraging in-situ data processing as a first (and, in the long run, the only realistic) choice for realizing planetary-scale Big Data systems. 34

37 35