Building a Data Lake with Spark and Cassandra Brendon Smith & Mayur Ladwa

Size: px
Start display at page:

Download "Building a Data Lake with Spark and Cassandra Brendon Smith & Mayur Ladwa"

Transcription

1 Building a Data Lake with Spark and Cassandra Brendon Smith & Mayur Ladwa July 2015

2 BlackRock: Who We Are BLK data as of 31 st March 2015 is the world s largest investment manager Manages over $4.7 trillion in assets Owns over 5% of 2,500 companies #31 on Fortune s list of the World s Most Admired Companies 2015 World s largest provider of Exchange-Traded Funds Advisor to governments and central banks Technology provider (SAAS) *BLK data is as of 31 st March

3 BlackRock as a Technology Provider (SAAS) is BlackRock s enterprise investment system Used by BlackRock and over 60 other large financial institutions to manage over $15 trillion in assets Generates ~$500M in revenue Every day 4 million financial instruments are managed 12 million financial transactions are processed 10 million investment holdings valued and analysed 100 million Monte Carlo iterations are computed 6 billion messages are sent on Aladdin s message bus 2 billion Cassandra transactions are executed 1,000+ developers working on Aladdin No tolerance for downtime *BLK data is as of 31 st March

4 What our data development teams do (and emerging technical challenges) We provide: Analysis and Visualization capabilities To our users: Portfolio Managers, Traders, Researchers, and Clients This helps our investment users: Make more informed investment decisions Discover new investment strategies (data-driven) The firm manage risk more effectively for our clients Protect our clients money by assessing risk scenarios and responses Helps The Firm respond quickly to macro events (e.g. Greek crisis) This requires: Storing Financial time-series data growing Volume With varied and fast changing structure new risk scenarios (Variety) Fast & Flexible queries on this (Velocity of analysis: train-of-thought) Example: Firm Wide Exposures & Risk Scenario Exceptions 4

5 The Past: What Tech have we been using for this? Previously.. We were using: Star-Schemas for relational use cases (Kimball et al) Great for aggregation, slicing and dicing data, driven by users Struggling with growing Volume (difficult to manage data >50Tb on RDBMS) Cassandra for semi-structured data and DB cache use cases Good for linear scalability, resilience, semi-structured support But expensive to re-point many reporting and BI tools (SQL still lingua-franca) Not strong at aggregating large data an flexible slice & dice Hadoop for special purpose analysis (financial modelling on specific asset classes) Recently.. more and more use cases that need to combine: Fast analysis and aggregation of large relational data Semi-structured and fast changing data points Larger volumes of data (need to support >1000Tb for many use cases) Machine Learning & data-driven investment strategies Spark + Cassandra may offer best of all of these.. 5

6 The Future We are building a new BIG DATA platform that will: Provide a data-hub for clients co-locate their own financial data with Aladdin Data (Schema on Read) Then query and visualize this in highly interactive way Using the visualization tools we are building, or their choice of reporting and BI tools including: Tableau, Qlik, Spotfire And support Data Science & data-driven investment strategies, with Note Books, and integration with R, MatLab and others packages The stack we are using includes: Spark: SQL, Streaming, MLib, Job Server as the core Big-data framework Parquet: for compressed columnar format Cassandra: as distributed persistence layer HDFS interface on Cassandra (Snack FS) Scala Java8, and Python, D3, HTML5, AngularJS, Bootstrap Web Tools: D3, AngularJS, Bootstrap 6

7 Technologies Cassandra: Parquet: Jupyter: Spark: Scala: Python: D3: Mesos: HDFS: 7

8 Underlying Stack Technology What is it? Why use it? Cassandra SnackFS Parquet Distributed database Lightweight HDFS compatible file system Storage Format peer-to-peer design allows for high performance with linear scalability no single points of failure, even across multiple data centres. Requires no additional setup on the Cassandra Cluster. DFS that works as a replacement for HDFS Particularly targeted towards Spark users who are using Cassandra in their infrastructure. Columnar data format Very efficient compression Spark Cluster computing framework Speed (at least 10x faster than Hadoop MR) Ease of use and many supported languages Powerful stack of high-level tools out of the box Runs over many other technologies Jupyter Interactive computational environment Code Execution Rich graphics and plots Easy for users to become data scientists 8

9 The storage problem which has been solved Storage Size of storage is getting cheaper, 1TB disk is ~ 35 Capacity doubles every 18 months Time to read 1TB from disk is ~3 hours (100MB/s) not including network Examples: Facebook s daily logs are 60TB Google s web index is over 10PB What do traditional analysis tools have in common? *NIX shell commands and scripts Python scripts and Pandas R Single machines can no longer process or even store all the data 9

10 Distributed processing of data So we are storing our data over a distributed system! We bulk load data into our cluster of disks or collect data over time Wait how do we kick off a program that does our analysis over many machines? Counting occurrence of words Map Reduce Google pioneered the MapReduce algorithm and was later implemented by Apache as an easy to use downloadable technology known as Hadoop. 10

11 Hadoop Awesome, we ve cracked it, lets go to the pub Hold on, where does the data get stored when all this is going on? What if there s a massive amount of data? What happens if this is an iterative job? Disk I/O is slow here 11

12 Spark RAM is faster than disk?! 12

13 What does it look like? masters can use zookeeper for high availability 13

14 Spark and regular Map Reduce differences Generalised patterns Unified engine for many use cases Lazy evaluation Reduces wait states, better pipelining Lower overhead for starting jobs as well as a much less expensive shuffle More operations Map, Reduce Join, Sample, etc Handles Streaming as well as batch jobs Lots of programming environments! Scala, Java, Python, R 14

15 How do you code for this beast? You have a SparkContext object sc which is set up to point at your spark cluster You create an DataFrame (an extension to RDDs) df = sc.textfile( numbers.csv ) df = sc.parallelize([1, 2, 3, 4]) Perform a transformation on that DataFrame which returns a new DataFrame df.map(lambda x: x * 2) DataFrame: [1, 2, 3, 4] [2, 4, 6, 8] df.filter(lambda x: x % 2 == 0) DataFrame : [1, 2, 3, 4] [2, 4] Perform an action to have your recipe sent down to the workers as tasks and come back as a result df.count() df.collect() 15

16 Getting data out of our stack > val sqlcontext = new SQLContext(sc) > val df = sqlcontext.parquetfile( /aladdin_data_v2/tradedata.parquet ) > val df.registertemptable( tradedata ) > val resultdf = sqlcontext.sql( select asset, avg(price) from tradedata where trade_date between 01-MAR-15 and 01-FEB-15 group by asset ) > resultdf.collect.foreach(println) 16

17 The outcome 17

18 Disclaimer The data provided is for informational purposes only. The information and opinions contained on this website are derived from proprietary and non-proprietary sources deemed by BlackRock to be reliable, are not necessarily all inclusive and are not guaranteed as to accuracy. BlackRock shall not have any liability for the accuracy of the information contained herein, for delays or omissions therein, or for any results based on the use of such information BlackRock, Inc. All rights reserved. BLACKROCK and ALADDIN registered and unregistered trademarks of BlackRock, Inc., or its subsidiaries in the United States and elsewhere. All other marks are the property of their respective owners. TECH

20775: Performing Data Engineering on Microsoft HD Insight

20775: Performing Data Engineering on Microsoft HD Insight Let s Reach For Excellence! TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC Address: 103 Pasteur, Dist.1, HCMC Tel: 08 38245819; 38239761 Email: traincert@tdt-tanduc.com Website: www.tdt-tanduc.com; www.tanducits.com

More information

20775 Performing Data Engineering on Microsoft HD Insight

20775 Performing Data Engineering on Microsoft HD Insight Duración del curso: 5 Días Acerca de este curso The main purpose of the course is to give students the ability plan and implement big data workflows on HD. Perfil de público The primary audience for this

More information

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. Course Content Course Description: The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. At Course Completion: After competing this course,

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Duration: 5 days; Instructor-led Implement Spark Streaming Using the DStream API. Develop Big Data Real-Time Processing Solutions with Apache

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Course Details Course Code: Duration: Notes: 20775A 5 days This course syllabus should be used to determine whether the course is appropriate

More information

BIG DATA AND HADOOP DEVELOPER

BIG DATA AND HADOOP DEVELOPER BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1

More information

Intro to Big Data and Hadoop

Intro to Big Data and Hadoop Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties

More information

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop

More information

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade

More information

Databricks Cloud. A Primer

Databricks Cloud. A Primer Databricks Cloud A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to

More information

Microsoft Azure Essentials

Microsoft Azure Essentials Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,

More information

5th Annual. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved. 5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software

More information

Berkeley Data Analytics Stack (BDAS) Overview

Berkeley Data Analytics Stack (BDAS) Overview Berkeley Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY What is Big used For? Reports, e.g., - Track business processes, transactions Diagnosis, e.g., - Why is user engagement dropping?

More information

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA. ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed

More information

Big data is hard. Top 3 Challenges To Adopting Big Data

Big data is hard. Top 3 Challenges To Adopting Big Data Big data is hard Top 3 Challenges To Adopting Big Data Traditionally, analytics have been over pre-defined structures Data characteristics: Sales Questions answered with BI and visualizations: Customer

More information

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake White Paper Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake Motivation for Modernization It is now a well-documented realization among Fortune 500 companies

More information

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Analytics R&D and Product Management Document Version 1 WP-Urika-GX-Big-Data-Analytics-0217 www.cray.com

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457

More information

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper Sponsored by Successful Data Warehouse Approaches to Meet Today s Analytics Demands EXECUTIVE BRIEF In this Paper Organizations are adopting increasingly sophisticated analytics methods Analytics usage

More information

Insights to HDInsight

Insights to HDInsight Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive

More information

EXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

IBM Analytics Unleash the power of data with Apache Spark

IBM Analytics Unleash the power of data with Apache Spark IBM Analytics Unleash the power of data with Apache Spark Agility, speed and simplicity define the analytics operating system of the future 1 2 3 4 Use Spark to create value from data-driven insights Lower

More information

How In-Memory Computing can Maximize the Performance of Modern Payments

How In-Memory Computing can Maximize the Performance of Modern Payments How In-Memory Computing can Maximize the Performance of Modern Payments 2018 The mobile payments market is expected to grow to over a trillion dollars by 2019 How can in-memory computing maximize the performance

More information

Architecting for Real- Time Big Data Analytics. Robert Winters

Architecting for Real- Time Big Data Analytics. Robert Winters Architecting for Real- Time Big Data Analytics Robert Winters About Me 2 ROBERT WINTERS Head of Business Intelligence, TravelBird Ten years experience in analytics, five years with Vertica and big data

More information

Hadoop Course Content

Hadoop Course Content Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Enterprise-Scale MATLAB Applications

Enterprise-Scale MATLAB Applications Enterprise-Scale Applications Sylvain Lacaze Rory Adams 2018 The MathWorks, Inc. 1 Enterprise Integration Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems

More information

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Contents Introduction...3 Hadoop s humble beginnings...4 The benefits of Hadoop...5

More information

1% + 99% = AI Popularization

1% + 99% = AI Popularization 1% + 99% = AI Popularization Unifying Data Science and Engineering Jason Bissell General Manager, APAC The beginnings of Apache Spark at UC Berkeley AMPLab funded by tech companies: Got a glimpse at their

More information

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics   Nov. Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration

More information

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11 Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer

More information

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud

More information

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK Are you drowning in Big Data? Do you lack access to your data? Are you having a hard time managing Big Data processing requirements?

More information

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications. Databricks Primer Who is Databricks? Databricks was founded by the team who created Apache Spark, the most active open source project in the big data ecosystem today, and is the largest contributor to

More information

Advanced Analytics With Spark Patterns For Learning From Data At Scale

Advanced Analytics With Spark Patterns For Learning From Data At Scale Advanced Analytics With Spark Patterns For Learning From Data At Scale We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it

More information

Joining the disruption in the Asset Management Industry How to evaluate new technologies and implement new ideas like a start up company

Joining the disruption in the Asset Management Industry How to evaluate new technologies and implement new ideas like a start up company Joining the disruption in the Asset Management Industry How to evaluate new technologies and implement new ideas like a start up company Kyle Kung, PhD GX Innovation Lab State Street Global Exchange September

More information

Six Critical Capabilities for a Big Data Analytics Platform

Six Critical Capabilities for a Big Data Analytics Platform White Paper Analytics & Big Data Six Critical Capabilities for a Big Data Analytics Platform Table of Contents page Executive Summary...1 Key Requirements for a Big Data Analytics Platform...1 Vertica:

More information

EMC IT Big Data Analytics Journey. Mahmoud Ghanem Sr. Systems Engineer

EMC IT Big Data Analytics Journey. Mahmoud Ghanem Sr. Systems Engineer EMC IT Big Data Analytics Journey Mahmoud Ghanem Sr. Systems Engineer Agenda 1 2 3 4 5 Introduction To Big Data EMC IT Big Data Journey Marketing Science Lab Use Case Technical Benefits Lessons Learned

More information

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer Redefine Big Data: EMC Data Lake in Action Andrea Prosperi Systems Engineer 1 Agenda Data Analytics Today Big data Hadoop & HDFS Different types of analytics Data lakes EMC Solutions for Data Lakes 2 The

More information

Apache Spark and R A (big data) love story?

Apache Spark and R A (big data) love story? Apache Spark and R A (big data) love story? Mark Sellors - Technical Architect @ Mango Solutions About me. Technical Architect Design and deploy analytic computing environments Not really an R user but

More information

Leveraging smart meter data for electric utilities:

Leveraging smart meter data for electric utilities: Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive 5/16/2017 Hitachi, Ltd. OSS Solution Center Yusuke Furuyama Shogo Kinoshita Who are we? Yusuke Furuyama Solutions engineer

More information

Leveraging smart meter data for electric utilities:

Leveraging smart meter data for electric utilities: Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive 5/16/2017 Hitachi, Ltd. OSS Solution Center Yusuke Furuyama Shogo Kinoshita Who are we? Yusuke Furuyama Solutions engineer

More information

Spark, Hadoop, and Friends

Spark, Hadoop, and Friends Spark, Hadoop, and Friends (and the Zeppelin Notebook) Douglas Eadline Jan 4, 2017 NJIT Presenter Douglas Eadline deadline@basement-supercomputing.com @thedeadline HPC/Hadoop Consultant/Writer http://www.basement-supercomputing.com

More information

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1 Data Analytics for Semiconductor Manufacturing 2016 The MathWorks, Inc. 1 Competitive Advantage What do we mean by Data Analytics? Analytics uses data to drive decision making, rather than gut feel or

More information

The ETL Problem Solved: The Compelling Financial Case for Running Analytics on the Mainframe

The ETL Problem Solved: The Compelling Financial Case for Running Analytics on the Mainframe Advisory The ETL Problem Solved: The Compelling Financial Case for Running Analytics on the Mainframe Executive Summary With the introduction of analytics products tuned for the mainframe, and with improvements

More information

WELCOME TO. Cloud Data Services: The Art of the Possible

WELCOME TO. Cloud Data Services: The Art of the Possible WELCOME TO Cloud Data Services: The Art of the Possible Goals for Today Share the cloud-based data management and analytics technologies that are enabling rapid development of new mobile applications Discuss

More information

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data

More information

Data Ingestion in. Adobe Experience Platform

Data Ingestion in. Adobe Experience Platform Contents The challenges with data Adobe Experience Platform Data Ingestion in Adobe Experience Platform Data Ingestion Service Data Lake Conclusion Adobe Experience Platform helps customers to centralize

More information

Transforming Big Data to Business Benefits

Transforming Big Data to Business Benefits Transforming Big Data to Business Benefits Automagical EDW to Big Data Migration BI at the Speed of Thought Stream Processing + Machine Learning Platform Table of Contents Introduction... 3 Case Study:

More information

In-Memory Analytics: Get Faster, Better Insights from Big Data

In-Memory Analytics: Get Faster, Better Insights from Big Data Discussion Summary In-Memory Analytics: Get Faster, Better Insights from Big Data January 2015 Interview Featuring: Tapan Patel, SAS Institute, Inc. Introduction A successful analytics program should translate

More information

Machine Learning For Enterprise: Beyond Open Source. April Jean-François Puget

Machine Learning For Enterprise: Beyond Open Source. April Jean-François Puget Machine Learning For Enterprise: Beyond Open Source April 2018 Jean-François Puget Use Cases for Machine/Deep Learning Cyber Defense Drug Discovery Fraud Detection Aeronautics IoT Earth Monitoring Advanced

More information

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage Executive Summary What Industry Analysts

More information

BIG DATA and DATA SCIENCE

BIG DATA and DATA SCIENCE Integrated Program In BIG DATA and DATA SCIENCE CONTINUING STUDIES Table of Contents About the Course...03 Key Features of Integrated Program in Big Data and Data Science...04 Learning Path...05 Key Learning

More information

Spark and Hadoop Perfect Together

Spark and Hadoop Perfect Together Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users Data Operating System

More information

Cask Data Application Platform (CDAP) Extensions

Cask Data Application Platform (CDAP) Extensions Cask Data Application Platform (CDAP) Extensions CDAP Extensions provide additional capabilities and user interfaces to CDAP. They are use-case specific applications designed to solve common and critical

More information

Cognizant BigFrame Fast, Secure Legacy Migration

Cognizant BigFrame Fast, Secure Legacy Migration Cognizant BigFrame Fast, Secure Legacy Migration Speeding Business Access to Critical Data BigFrame speeds migration from legacy systems to secure next-generation data platforms, providing up to a 4X performance

More information

Large US Bank Boosts Insider Threat Detection by 5X with StreamAnalytix

Large US Bank Boosts Insider Threat Detection by 5X with StreamAnalytix Large US Bank Boosts Insider Threat Detection by 5X with StreamAnalytix About the customer About the Customer A large US-based financial services corporation known for its extensive credit card business

More information

Integrating MATLAB Analytics into Enterprise Applications

Integrating MATLAB Analytics into Enterprise Applications Integrating MATLAB Analytics into Enterprise Applications David Willingham 2015 The MathWorks, Inc. 1 Run this link. http://bit.ly/matlabapp 2 Key Takeaways 1. What is Enterprise Integration 2. What is

More information

Deliver Always-On, Real-Time Insights at Scale. with DataStax Enterprise Analytics

Deliver Always-On, Real-Time Insights at Scale. with DataStax Enterprise Analytics Deliver Always-On, Real-Time Insights at Scale with DataStax Enterprise Analytics CONTENTS Meeting the Needs of the Right-Now Customer...3 Introducing DSE Analytics...3 Faster Performance than Open Source

More information

Big Data The Big Story

Big Data The Big Story Big Data The Big Story Jean-Pierre Dijcks Big Data Product Mangement 1 Agenda What is Big Data? Architecting Big Data Building Big Data Solutions Oracle Big Data Appliance and Big Data Connectors Customer

More information

Operational Hadoop and the Lambda Architecture for Streaming Data

Operational Hadoop and the Lambda Architecture for Streaming Data Operational Hadoop and the Lambda Architecture for Streaming Data 2015 MapR Technologies 2015 MapR Technologies 1 Topics From Batch to Operational Workloads on Hadoop Streaming Data Environments The Lambda

More information

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia MapR: Converged Data Pla3orm and Quick Start Solu;ons Robin Fong Regional Director South East Asia Who is MapR? MapR is the creator of the top ranked Hadoop NoSQL SQL-on-Hadoop Real Database time streaming

More information

EDW MODERNIZATION & CONSUMPTION

EDW MODERNIZATION & CONSUMPTION EDW MODERNIZATION & CONSUMPTION RAPIDLY. AT ANY SCALE. TRANSFORMING THE EDW TO BIG DATA/CLOUD VISUAL DATA SCIENCE AND ETL WITH APACHE SPARK FASTEST BI ON BIG DATA AT MASSIVE SCALE Table of Contents Introduction...

More information

Preface About the Book

Preface About the Book Preface About the Book We are living in the dawn of what has been termed as the "Fourth Industrial Revolution" by the World Economic Forum (WEF) in 2016. The Fourth Industrial Revolution is marked through

More information

Microsoft reinvents sales processing and financial reporting with Azure

Microsoft reinvents sales processing and financial reporting with Azure Microsoft IT Showcase Microsoft reinvents sales processing and financial reporting with Azure Core Services Engineering (CSE, formerly Microsoft IT) is moving MS Sales, the Microsoft revenue reporting

More information

Bringing the Power of SAS to Hadoop Title

Bringing the Power of SAS to Hadoop Title WHITE PAPER Bringing the Power of SAS to Hadoop Title Combine SAS World-Class Analytics With Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities ii Contents Introduction... 1 What

More information

StackIQ Enterprise Data Reference Architecture

StackIQ Enterprise Data Reference Architecture WHITE PAPER StackIQ Enterprise Data Reference Architecture StackIQ and Hortonworks worked together to Bring You World-class Reference Configurations for Apache Hadoop Clusters. Abstract Contents The Need

More information

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand Paper 2698-2018 Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand ABSTRACT Digital analytics is no longer just about tracking the number

More information

Alexander Klein. ETL meets Azure

Alexander Klein. ETL meets Azure Alexander Klein ETL meets Azure Thanks to our sponsors: Who am I? Independent BI Consultant > 15 years experience of SQL Server Focus on Microsoft BI Stack & AI & Azure a.klein@consulting-bi.de @SQL_Alex

More information

Tax Solution Innovation in the Cloud

Tax Solution Innovation in the Cloud Vertex Tax Solution Innovation in the Cloud Building a Software-as-a-Service Solution to Address Market Demand 1 / 11 Table of Contents 3 Vertex: Embracing the Cloud 5 Transforming Tax Management for Customers

More information

COST ADVANTAGES OF HADOOP ETL OFFLOAD WITH THE INTEL PROCESSOR- POWERED DELL CLOUDERA SYNCSORT SOLUTION

COST ADVANTAGES OF HADOOP ETL OFFLOAD WITH THE INTEL PROCESSOR- POWERED DELL CLOUDERA SYNCSORT SOLUTION link COST ADVANTAGES OF HADOOP ETL OFFLOAD WITH THE INTEL PROCESSOR- POWERED DELL CLOUDERA SYNCSORT SOLUTION Many companies are adopting Hadoop solutions to handle large amounts of data stored across clusters

More information

From Data Deluge to Intelligent Data

From Data Deluge to Intelligent Data SAP Data Hub From Data Deluge to Intelligent Data Orchestrate Your Data for an Intelligent Enterprise Data for Intelligence, Speed, and With Today, corporate data landscapes are growing increasingly diverse

More information

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1 Spotlight Sessions Nik Rouda Director of Product Marketing Cloudera @nrouda Cloudera, Inc. All rights reserved. 1 Spotlight: Protecting Your Data Nik Rouda Product Marketing Cloudera, Inc. All rights reserved.

More information

Transforming Analytics with Cloudera Data Science WorkBench

Transforming Analytics with Cloudera Data Science WorkBench Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s

More information

MapR Pentaho Business Solutions

MapR Pentaho Business Solutions MapR Pentaho Business Solutions The Benefits of a Converged Platform to Big Data Integration Tom Scurlock Director, WW Alliances and Partners, MapR Key Takeaways 1. We focus on business values and business

More information

TIBCO Live Datamart providing an operational command and control center in a virtual train application.

TIBCO Live Datamart providing an operational command and control center in a virtual train application. TIBCO Live Datamart BENEFITS Serve your real-time application needs with a purpose-built live data mart that continuously pushes the right data to the right clients. Detect problems when they happen by

More information

MapR: Solution for Customer Production Success

MapR: Solution for Customer Production Success 2015 MapR Technologies 2015 MapR Technologies 1 MapR: Solution for Customer Production Success Big Data High Growth 700+ Customers Cloud Leaders Riding the Wave with Hadoop The Big Data Platform of Choice

More information

Stateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018

Stateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018 Stateful Services on DC/OS Santa Clara, California April 23th 25th, 2018 Who Am I? Shafique Hassan Solutions Architect @ Mesosphere Operator 2 Agenda DC/OS Introduction and Recap Why Stateful Services

More information

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

DATA SCIENCE: HYPE AND REALITY PATRICK HALL DATA SCIENCE: HYPE AND REALITY PATRICK HALL About me SAS Enterprise Miner, 2012 Cloudera Data Scientist, 2014 Do you use Kolmogorov Smirnov often? Statistician No, I mix my martinis with gin. Data Scientist

More information

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN From Information to Insight: The Big Value of Big Data Faire Ann Co Marketing Manager, Information Management Software, ASEAN The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

Technology. careers.blackrock.com SOFTWARE ENGINEERING

Technology. careers.blackrock.com SOFTWARE ENGINEERING Technology BlackRock was founded by eight entrepreneurs who wanted to start a very different company. One that combined the best of a financial leader and a technology pioneer. And one that focused many

More information

Common Customer Use Cases in FSI

Common Customer Use Cases in FSI Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine

More information

Realising Value from Data

Realising Value from Data Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation

More information

Meta-Managed Data Exploration Framework and Architecture

Meta-Managed Data Exploration Framework and Architecture Meta-Managed Data Exploration Framework and Architecture CONTENTS Executive Summary Meta-Managed Data Exploration Framework Meta-Managed Data Exploration Architecture Data Exploration Process: Modules

More information

The Importance of good data management and Power BI

The Importance of good data management and Power BI The Importance of good data management and Power BI The BI Iceberg Visualising Data is only the tip of the iceberg Data Preparation and provisioning is a complex process Streamlining this process is key

More information

Jason Virtue Business Intelligence Technical Professional

Jason Virtue Business Intelligence Technical Professional Jason Virtue Business Intelligence Technical Professional jvirtue@microsoft.com Agenda Microsoft Azure Data Services Azure Cloud Services Azure Machine Learning Azure Service Bus Azure Stream Analytics

More information

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses Nouvelle Génération de l infrastructure Data Warehouse et d Analyses November 2011 André Münger andre.muenger@emc.com +41 79 708 85 99 1 Agenda BIG Data Challenges Greenplum Overview Use Cases Summary

More information

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform Federico Pozzi @fedealbpozzi Mathias Coopmans @macoopma Characteristics of a badly managed platform No clear data

More information

Real-Time Streaming: IMS to Apache Kafka and Hadoop

Real-Time Streaming: IMS to Apache Kafka and Hadoop Real-Time Streaming: IMS to Apache Kafka and Hadoop - 2017 Scott Quillicy SQData Outline methods of streaming mainframe data to big data platforms Set throughput / latency expectations for popular big

More information

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel AZURE HDINSIGHT Azure Machine Learning Track Marek Chmel SESSION AGENDA Understanding different scenarios of Hadoop Building an end to end pipeline using HDInsight Using in-memory techniques to analyze

More information

Machine-generated data: creating new opportunities for utilities, mobile and broadcast networks

Machine-generated data: creating new opportunities for utilities, mobile and broadcast networks APPLICATION BRIEF Machine-generated data: creating new opportunities for utilities, mobile and broadcast networks Electronic devices generate data every millisecond they are in operation. This data is

More information

Hybrid Data Management

Hybrid Data Management Kelly Schlamb Executive IT Specialist, Worldwide Analytics Platform Enablement and Technical Sales (kschlamb@ca.ibm.com, @KSchlamb) Hybrid Data Management IBM Analytics Summit 2017 November 8, 2017 5 Essential

More information

Evolution or Revolution: Top Ten Development Trends

Evolution or Revolution: Top Ten Development Trends Evolution or Revolution: Top Ten Development Trends Jim Lundy CEO and Lead Analyst IT Development Trends: Building a Fighter Jet Agenda What are the Top Ten Trends in Development? What are the Best Practices

More information

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

TechArch Day Digital Decoupling. Oscar Renalias. Accenture TechArch Day 2018 Digital Decoupling Oscar Renalias Accenture !"##$ oscar.renalias@acenture.com @oscarrenalias https://www.linkedin.com/in/oscarrenalias/ https://github.com/accenture THE ERA OF THE BIG

More information

C3 Products + Services Overview

C3 Products + Services Overview C3 Products + Services Overview AI CLOUD PREDICTIVE ANALYTICS IoT Table of Contents C3 is a Computer Software Company 1 C3 PaaS Products 3 C3 SaaS Products 5 C3 Product Trials 6 C3 Center of Excellence

More information

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our

More information

Data Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence

Data Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence Data Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence Data Science, Machine Learning and Artificial Intelligence Deep Learning AREAS OF AI Rule-based

More information

SAP Predictive Analytics Suite

SAP Predictive Analytics Suite SAP Predictive Analytics Suite Tania Pérez Asensio Where is the Evolution of Business Analytics Heading? Organizations Are Maturing Their Approaches to Solving Business Problems Reactive Wait until a problem

More information

SAP Machine Learning for Hadoop. Customer

SAP Machine Learning for Hadoop. Customer SAP Machine Learning for Hadoop Customer SAP BusinessObjects Predictive Analytics and Big Data 1. Support for end-to-end operational predictive lifecycle on Hadoop 2. Business Analyst Friendly No coding

More information