Guagua: An Iterative Computing Framework on Hadoop. Zhang Pengshan(David), PayPal
|
|
- Todd Harmon
- 5 years ago
- Views:
Transcription
1
2 Guagua: An Iterative Computing Framework on Hadoop Zhang Pengshan(David), PayPal
3 AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future Plans
4 AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future Plans
5 ALIPAY vs. PAYPAL Q: Where is risk control in PayPal? A: Risk control is everywhere in paypal.com.
6 FRAUD TYPES IN PAYPAL Account Take Over Fraud Types in PayPal Stolen Financials Credit Cards INR/SNAD INR: SNAD: Item Not Received Significantly Not as Described
7 RISK CONTROL IN PAYPAL Models Rules Agents
8 RISK MODELING IN PAYPAL MODELING CHALLENGES Thousands of Features Algorithms (LR, NN, DT) Big Training Data SLA (Online) Simulation
9 RISK MODELING IN PAYPAL MODELING CHALLENGES Thousands of Features Algorithms (LR, NN, DT) Big Training Data SLA (Online) Simulation Q: How to train models with TB data and thousands of features?
10 AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future Plans
11 DISTRIBUTED NEURAL NETWORK ALGORITHM* GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] ACCUMULATE GRADIENTS UPDATE WEIGHTS 1st iteration WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] ACCUMULATE GRADIENTS UPDATE WEIGHTS 2nd iteration WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] * Distributed batch gradient descent algorithm.
12 DISTRIBUTED NEURAL NETWORK ALGORITHM GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] ACCUMULATE GRADIENTS UPDATE WEIGHTS 1st iteration WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] ACCUMULATE GRADIENTS UPDATE WEIGHTS 2nd iteration WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] Q: How to implement it?
13 WHY NOT MAHOUT OR SPARK? Mahout Spark No distributed logistic regression & neural network. Iterative through Hadoop jobs, bad performance. No independent Spark cluster. Hadoop cluster is still 1.0 based, not YARN. Q: How to implement it in Hadoop?
14 POSSIBLE SOLUTIONS Pros Cons Hadoop YARN Flexible framework for framework Self resource management Alpha PayPal Clusters: Hadoop Extra fault tolerance, splits, UI Hadoop MapReduce Works well on all Hadoop versions Mature computing model Internal fault tolerance, splits, UI Different computing model How to do iterative coordination?
15 AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future Plans
16 ITERATIVE COMPUTING MODEL IN GUAGUA WORKER RESULT WORKER RESULT WORKER RESULT 1st iteration MASTER RESULT MASTER RESULT MASTER RESULT WORKER RESULT WORKER RESULT WORKER RESULT 2nd iteration MASTER RESULT MASTER RESULT MASTER RESULT Guagua is a framework over such iterative computing model, compared with Hadoop 1.0 over MapReduce.
17 GUAGUA API Computable Computable
18 GUAGUA OVERVIEW - s core CORE Fault Tolerance Coordination Iterative Computing Framework MapReduce Adapter (For Hadoop 1.0) YARN Adapter (For Hadoop 2.0) Consistent Client Distributed Neural Network Application
19 PLUGGABLE, SCALABLE INTERCEPTORS Fault Tolerance Interceptor Fault Tolerance Interceptor ZooKeeper Coordinator ZooKeeper Coordinator Perf Interceptor Perf Interceptor Timer Timer User Defined Interceptors User Defined Interceptors Computation Computation * These two graphs are aspects for each iteration.
20 GUAGUA RUNTIME : Mapper (Container) : Mapper (Container) : Mapper (Container) : Mapper (Container) : Mapper (Container) : Mapper (Container)
21 GUAGUA RUNTIME : Mapper (Container) REGISTER REGISTER : Mapper (Container) : Mapper (Container) REGISTER ZooKeeper Cluster REGISTER : Mapper (Container) REGISTER : Mapper (Container) REGISTER : Mapper (Container) 1. is listening znodes of workers. 2. s are listening znode of master.
22 GUAGUA RUNTIME : Mapper (Container) UPDATE ITER UPDATE ITER : Mapper (Container) : Mapper (Container) UPDATE ITER ZooKeeper Cluster UPDATE IITER : Mapper (Container) UPDATE ITER : Mapper (Container) UPDATE ITER : Mapper (Container) 1. Data is loaded in worker memory in the first iteration. 2. Whole process is done when reaches maximal iteration or halt condition is triggered.
23 AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future Plans
24 FAULT TOLERANCE n
25 FAULT TOLERANCE n * The same on workers.
26 STRAGGLER MITIGATION 1 2 3
27 STRAGGLER MITIGATION 1 2 3
28 STRAGGLER MITIGATION 1 2 3
29 PROGRESS AND STATE REPORT 0.86 = 432/501 (Current Iteration) / (Total Iteration)
30 GUAGUA UNIT
31 AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future Plans
32 WHAT IS SHIFU? Shifu* is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. NEW INIT STATS VARSELECT EVAL NORMALIZE POSTTRAIN TRAIN Built on Guagua *Want to try Shifu? Please visit
33 SHIFU ON GUAGUA (TRAIN STEP) Computable Interceptor Computable Gradients NN AbstractComputable BasicInterceptor Weights NN NNOutput GUAGUA API SHIFU CODE ENCOG CODE
34 SHIFU NN vs. SPARK LR Run Time Comparison Time Shifu-NN Spark-LR Iteration Shifu-NN: 1102*20*1 Netw ork, 319 Mappers * 1G Spark-LR: 1102 features, 120 executors * 3G
35 SHIFU NN BENCHMARK RESULTS Run Time (Seconds) Time(Seconds) G 375G 500G 625G 750G 875G 1000G Size of Input All data are located in memory. At most w e used 2400 mappers. 20 epochs are used.
36 AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future Plans
37 WHAT S NEXT? More open source docs Support more (distributed) machine learning algorithms Improve YARN (Beta) implementation Support more input formats Big model support Deep learning support
38 Q&A
39 APPENDIX Website Guagua issue website Shifu & Guagua source code:
40
41 @InfoQ infoqchina
BIG DATA AND HADOOP DEVELOPER
BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1
More informationDeloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward
Deloitte School of Analytics Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward February 2018 Agenda 7 February 2018 8 February 2018 9 February 2018 8:00 9:00 Networking
More informationIntroduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation
Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop
More informationEnterprise-Scale MATLAB Applications
Enterprise-Scale Applications Sylvain Lacaze Rory Adams 2018 The MathWorks, Inc. 1 Enterprise Integration Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems
More informationIntro to Big Data and Hadoop
Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties
More informationNew Approach for scheduling tasks and/or jobs in Big Data Cluster
New Approach for scheduling tasks and/or jobs in Big Data Cluster IT College, Chairperson of MS Dept. Agenda Introduction What is Big Data? The 4 characteristics of Big Data V4s Different Categories of
More informationBig Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase
BIG DATA COURSE Big Data Application Engineer/ Developer Specialization in Apache Spark, Kafka, Airflow, HBase In Exclusive Association with 21,347+ Participants 10,000+ Brands 1200+ Trainings 45+ Countries
More informationDATA SCIENCE: HYPE AND REALITY PATRICK HALL
DATA SCIENCE: HYPE AND REALITY PATRICK HALL About me SAS Enterprise Miner, 2012 Cloudera Data Scientist, 2014 Do you use Kolmogorov Smirnov often? Statistician No, I mix my martinis with gin. Data Scientist
More informationABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.
ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed
More informationBig data in practice
Big data in practice Françoise Soulié Fogelman Gilles Polart-Donat IMT - TeraLab Berkeley, CA Clark Kerr Campus Conference Center September 30 - October 2, 2014 Agenda What is Big Data Opportunities &
More informationHow In-Memory Computing can Maximize the Performance of Modern Payments
How In-Memory Computing can Maximize the Performance of Modern Payments 2018 The mobile payments market is expected to grow to over a trillion dollars by 2019 How can in-memory computing maximize the performance
More informationBIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW
BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade
More informationStateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018
Stateful Services on DC/OS Santa Clara, California April 23th 25th, 2018 Who Am I? Shafique Hassan Solutions Architect @ Mesosphere Operator 2 Agenda DC/OS Introduction and Recap Why Stateful Services
More informationCS 6604: Data Mining Large Networks and Time- series. B. Aditya Prakash Lecture #6: Hadoop and Graph Analysis
CS 0: Data Mining Large Networks and Time- series B. Aditya Prakash Lecture #: Hadoop and Graph Analysis What to do if data is really large? Peta- bytes (exabytes, ze?abytes.. ) Google processed PB of
More informationTop 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11
Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer
More informationCASE STUDY Delivering Real Time Financial Transaction Monitoring
CASE STUDY Delivering Real Time Financial Transaction Monitoring Steve Wilkes Striim Co-Founder and CTO Background Customer is a US based Payment Systems Provider Large Network of ATM and Cashier Operated
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457
More informationAchieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform
Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Analytics R&D and Product Management Document Version 1 WP-Urika-GX-Big-Data-Analytics-0217 www.cray.com
More informationHadoop Course Content
Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers
More informationTutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationCommon Customer Use Cases in FSI
Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine
More informationAccelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica
Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud
More informationBrian Macdonald Big Data & Analytics Specialist - Oracle
Brian Macdonald Big Data & Analytics Specialist - Oracle Improving Predictive Model Development Time with R and Oracle Big Data Discovery brian.macdonald@oracle.com Copyright 2015, Oracle and/or its affiliates.
More informationHPC IN THE CLOUD Sean O Brien, Jeffrey Gumpf, Brian Kucic and Michael Senizaiz
HPC IN THE CLOUD Sean O Brien, Jeffrey Gumpf, Brian Kucic and Michael Senizaiz Internet2, Case Western Reserve University, R Systems NA, Inc 2015 Internet2 HPC in the Cloud CONTENTS Introductions and Background
More informationHadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management
Hadoop on HPC: Integrating Hadoop and Pilot-based Andre Luckow, Ioannis Paraskevakos, George Chantzialexiou and Shantenu Jha Overview Introduction and Motivation Background Integrating Hadoop/Spark with
More informationBerkeley Data Analytics Stack (BDAS) Overview
Berkeley Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY What is Big used For? Reports, e.g., - Track business processes, transactions Diagnosis, e.g., - Why is user engagement dropping?
More informationApache Mesos. Delivering mixed batch & real-time data infrastructure , Galway, Ireland
Apache Mesos Delivering mixed batch & real-time data infrastructure 2015-11-17, Galway, Ireland Naoise Dunne, Insight Centre for Data Analytics Michael Hausenblas, Mesosphere Inc. http://www.meetup.com/galway-data-meetup/events/226672887/
More informationData Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1
Data Analytics for Semiconductor Manufacturing 2016 The MathWorks, Inc. 1 Competitive Advantage What do we mean by Data Analytics? Analytics uses data to drive decision making, rather than gut feel or
More informationInsights to HDInsight
Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive
More informationMachine Learning For Enterprise: Beyond Open Source. April Jean-François Puget
Machine Learning For Enterprise: Beyond Open Source April 2018 Jean-François Puget Use Cases for Machine/Deep Learning Cyber Defense Drug Discovery Fraud Detection Aeronautics IoT Earth Monitoring Advanced
More informationE-guide Hadoop Big Data Platforms Buyer s Guide part 1
Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors
More informationSmartCare. SPSS Workshop. Rick Durham - North American Advanced Analytics Channel Team IBM Corporation. Date: 5/28/2014
SPSS Workshop Key Presenter Rick Durham - North American Advanced Analytics Channel Team Date: 5/28/2014 Agenda What is Predictive Analytics? What is the architecture of the IBM/SPSS technology stack?
More informationSAS Machine Learning and other Analytics: Trends and Roadmap. Sascha Schubert Sberbank 8 Sep 2017
SAS Machine Learning and other Analytics: Trends and Roadmap Sascha Schubert Sberbank 8 Sep 2017 How Big Analytics will Change Organizations Optimization and Innovation Optimizing existing processes Customer
More informationRedefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer
Redefine Big Data: EMC Data Lake in Action Andrea Prosperi Systems Engineer 1 Agenda Data Analytics Today Big data Hadoop & HDFS Different types of analytics Data lakes EMC Solutions for Data Lakes 2 The
More informationNew Big Data Solutions and Opportunities for DB Workloads
New Big Data Solutions and Opportunities for DB Workloads Hadoop and Spark Ecosystem for Data Analytics, Experience and Outlook Luca Canali, IT-DB Hadoop and Spark Service WLCG, GDB meeting CERN, September
More informationEvolution or Revolution: Top Ten Development Trends
Evolution or Revolution: Top Ten Development Trends Jim Lundy CEO and Lead Analyst IT Development Trends: Building a Fighter Jet Agenda What are the Top Ten Trends in Development? What are the Best Practices
More informationAnalytical Capability Security Compute Ease Data Scale Price Users Traditional Statistics vs. Machine Learning In-Memory vs. Shared Infrastructure CRAN vs. Parallelization Desktop vs. Remote Explicit vs.
More informationIBM SPSS & Apache Spark
IBM SPSS & Apache Spark Making Big Data analytics easier and more accessible ramiro.rego@es.ibm.com @foreswearer 1 2016 IBM Corporation Modeler y Spark. Integration Infrastructure overview Spark, Hadoop
More informationData Analytics with MATLAB Adam Filion Application Engineer MathWorks
Data Analytics with Adam Filion Application Engineer MathWorks 2015 The MathWorks, Inc. 1 Case Study: Day-Ahead Load Forecasting Goal: Implement a tool for easy and accurate computation of dayahead system
More informationFINANCE + DEEP LEARNING SKYMIND / DEEPLEARNING4J 2015
FINANCE + DEEP LEARNING SKYMIND / DEEPLEARNING4J 2015 WHICH OUTCOMES MATTER? BASIC APPLICATIONS Anomaly Detection for Compliance Rogue Traders, Fat Fingers Fraud, Money Laundering Detection Trading Strategies
More informationUsing the Blaze Engine to Run Profiles and Scorecards
Using the Blaze Engine to Run Profiles and Scorecards 1993, 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording
More informationBuilding a Multi-Tenant Infrastructure for Diverse Application Workloads
Building a Multi-Tenant Infrastructure for Diverse Application Workloads Rick Janowski Marketing Manager IBM Platform Computing 1 The Why and What of Multi-Tenancy 2 Parallelizable problems demand fresh
More informationHadoop and Analytics at CERN IT CERN IT-DB
Hadoop and Analytics at CERN IT CERN IT-DB 1 Hadoop Use cases Parallel processing of large amounts of data Perform analytics on a large scale Dealing with complex data: structured, semi-structured, unstructured
More informationModernizing Your Data Warehouse with Azure
Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the
More informationINFORMS Analytics Maturity Model. User Guide
INFORMS Analytics Maturity Model User Guide Introducing the INFORMS Scorecard INFORMS, the leading association for advanced analytics professionals, seeks to advance the practice, research, methods, and
More informationIntegrating Artificial Intelligence and Simulation Modeling
www.pwc.com Integrating Artificial Intelligence and Simulation Modeling PwC Artificial Intelligence Accelerator Fall 2017 PwC Artificial Intelligence Accelerator is working with AnyLogic simulation and
More informationMachine Learning Logistic Regression Hamid R. Rabiee Spring 2015
Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression
More informationBuilding Efficient Large-Scale Big Data Processing Platforms
University of Massachusetts Boston ScholarWorks at UMass Boston Graduate Doctoral Dissertations Doctoral Dissertations and Masters Theses 5-31-2017 Building Efficient Large-Scale Big Data Processing Platforms
More informationSpark, Hadoop, and Friends
Spark, Hadoop, and Friends (and the Zeppelin Notebook) Douglas Eadline Jan 4, 2017 NJIT Presenter Douglas Eadline deadline@basement-supercomputing.com @thedeadline HPC/Hadoop Consultant/Writer http://www.basement-supercomputing.com
More informationSynthetic Data Generation for Realistic Analytics Examples and Testing
Synthetic Data Generation for Realistic Analytics Examples and Testing Ronald J. Nowling Red Hat, Inc. rnowling@redhat.com http://rnowling.github.io/ Who Am I? Software Engineer at Red Hat Data Science
More informationHow to develop Data Scientist Super Powers! Using Azure from R to scale and persist analytic workloads.. Simon Field
How to develop Data Scientist Super Powers! Using Azure from R to scale and persist analytic workloads.. Simon Field Topics Why cloud Managing cloud resources from R Highly parallelised model training
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
VIRT1400BU Real-World Customer Architecture for Big Data on VMware vsphere Joe Bruneau, General Mills Justin Murray, Technical Marketing, VMware #VMworld #VIRT1400BU Disclaimer This presentation may contain
More informationIntro Logistic Regression Gradient Descent + SGD
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 29, 2016 1 Ad Placement
More informationTransforming Analytics with Cloudera Data Science WorkBench
Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s
More informationOne Year Executive Program in Applied Business Analytics
Vivekanand Education Society Institute of Management Studies & Research Big Data & Advanced Analytics One Year Executive Program in Applied Business Analytics In a rapidly changing global environment,
More informationHortonworks Data Platform
Hortonworks Data Platform An open-architecture platform to manage data in motion and at rest Highlights Addresses a range of data-at-rest use cases Powers real-time customer applications Delivers robust
More informationData Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB
Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data
More information5th Annual. Cloudera, Inc. All rights reserved.
5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software
More informationReal Applications of Big Data. Challenges and Opportunities
Real Applications of Big Data. Challenges and Opportunities David Gil University of Alicante Polytechnic School - EPSA Department of Computer Technology Introduction to Big Data Projects Benchmarking (weka,
More informationBIG DATA ANALYTICS WITH HADOOP. 40 Hour Course
1 BIG DATA ANALYTICS WITH HADOOP 40 Hour Course OVERVIEW Learning Objectives Understanding Big Data Understanding various types of data that can be stored in Hadoop Setting up and Configuring Hadoop in
More informationPredictive Maintenance and Condition Monitoring
Predictive Maintenance and Condition Monitoring Tim Yeh Application Engineer 2018 The MathWorks, Inc. 1 Agenda What is Predictive Maintenance? Algorithm behind Remaining Useful Life (RUL) Estimation Methods
More informationCopyright 2013 Oracle and/or its affiliates. All rights reserved.
1 The Value of Big Data and Analytics in Government Jan 22, 2014 Wayne Babby, Deputy Director (A), California Department of Corrections & Rehabilitation Tim Dexter, Solution Architect, Analytics Practice,
More informationCluster management at Google
Cluster management at Google LISA 2013 john wilkes (johnwilkes@google.com) Software Engineer, Google, Inc. We own and operate data centers around the world http://www.google.com/about/datacenters/inside/locations/
More informationDEVELOP QUALITY CHARACTERISTICS BASED QUALITY EVALUATION PROCESS FOR READY TO USE SOFTWARE PRODUCTS
DEVELOP QUALITY CHARACTERISTICS BASED QUALITY EVALUATION PROCESS FOR READY TO USE SOFTWARE PRODUCTS Daiju Kato 1 and Hiroshi Ishikawa 2 1 WingArc1st Inc., Tokyo, Japan kato.d@wingarc.com 2 Graduate School
More informationConquer Dark Data by Unleasing the Power of SAP Analytics
Conquer Dark Data by Unleasing the Power of SAP Analytics Natasa Mastellou, Business Intelligence Solution Advisor, SAP Hellas, Cyprus & Malta 1 IoT is everywhere Connected Cars Smart Equipment Smart Logistics
More informationNational Occupational Standard
National Occupational Standard Overview This unit is about performing research and designing a variety of algorithmic models for internal and external clients 19 National Occupational Standard Unit Code
More informationOn Cloud Computational Models and the Heterogeneity Challenge
On Cloud Computational Models and the Heterogeneity Challenge Raouf Boutaba D. Cheriton School of Computer Science University of Waterloo WCU IT Convergence Engineering Division POSTECH FOME, December
More informationSt Louis CMG Boris Zibitsker, PhD
ENTERPRISE PERFORMANCE ASSURANCE BASED ON BIG DATA ANALYTICS St Louis CMG Boris Zibitsker, PhD www.beznext.com bzibitsker@beznext.com Abstract Today s fast-paced businesses have to make business decisions
More informationReal World Use Cases: Hadoop & NoSQL in Production. Big Data Everywhere London 4 June 2015
Real World Use Cases: Hadoop & NoSQL in Production Ted Dunning Big Data Everywhere London 4 June 2015 1 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC
More informationBig Data Trends Arató Bence. BI Consulting
Big Data Trends 2017 Arató Bence BI Consulting arato@biconsulting.hu 1 Introduction Arató Bence Consulting and Advisory BI/DW/Big Data strategy, Architecture planning, vendor and tool selection. Also provides
More informationPowered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS
Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS www.upxacademy.com 1800-123-1260 About us UpX Academy is an ed-tech platform providing advanced professional training in Big Data Analytics
More informationPowered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS
Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS www.upxacademy.com 1800-123-1260 About us UpX Academy is an ed-tech platform providing advanced professional training in Big Data Analytics
More informationProcessing over a trillion events a day CASE STUDIES IN SCALING STREAM PROCESSING AT LINKEDIN
Processing over a trillion events a day CASE STUDIES IN SCALING STREAM PROCESSING AT LINKEDIN Processing over a trillion events a day CASE STUDIES IN SCALING STREAM PROCESSING AT LINKEDIN Jagadish Venkatraman
More informationGot Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes
Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Contents Introduction...3 Hadoop s humble beginnings...4 The benefits of Hadoop...5
More informationRealising Value from Data
Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation
More informationBIG DATA IN BANKING Pravin Burra, Sept 2018
BIG DATA IN BANKING Pravin Burra, Sept 2018 1 November 2018 Data Use Cases Capacity Alignment Banks have large volumes of data In any large organisation including Budgets, technology and capacity IT scheduling
More informationPowered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS
Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS www.upxacademy.com 1800-123-1260 About us UpX Academy is an ed-tech platform providing advanced professional training in Big Data Analytics
More informationReal-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale
Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale
More informationHP Software EMEA Performance Tour Zurich, Switzerland September 18
HP Software EMEA Performance Tour 2013 Zurich, Switzerland September 18 HP Service Virtualization Virtuelle Services im Software Entwicklungs-Lebenszyklus Bernhard Weiss Bernd Schindelasch 18. September,
More informationYour Top 5 Reasons Why You Should Choose SAP Data Hub INTERNAL
Your Top 5 Reasons Why You Should Choose INTERNAL Top 5 reasons for choosing the solution 1 UNIVERSAL 2 INTELLIGENT 3 EFFICIENT 4 SCALABLE 5 COMPLIANT Universal view of the enterprise and Big Data: Get
More informationEnergy Efficient MapReduce Task Scheduling on YARN
Energy Efficient MapReduce Task Scheduling on YARN Sushant Shinde 1, Sowmiya Raksha Nayak 2 1P.G. Student, Department of Computer Engineering, V.J.T.I, Mumbai, Maharashtra, India 2 Assistant Professor,
More informationLeveraging Big Data For Payment Risk Management
Payments for platform businesses. New York 2014 Leveraging Big Data For Payment Risk Management John Canfield, VP Risk Management, WePay @JCRisk" June 11, 2014 Outline 1. Payments opportunity for the bottoms-up
More informationMapR: Solution for Customer Production Success
2015 MapR Technologies 2015 MapR Technologies 1 MapR: Solution for Customer Production Success Big Data High Growth 700+ Customers Cloud Leaders Riding the Wave with Hadoop The Big Data Platform of Choice
More informationScalable Distributed Online Machine Learning Framework for Realtime Analysis of Big Data
Scalable Distributed Online Machine Learning Framework for Realtime Analysis of Big Data Hiroyuki MAKINO, NTT Software Innovation Center XLDB2012 Sep. 12, 2012 Objective of Jubatus To satisfy Scalable,
More informationLOYALTY MANAGEMENT FOR RETAIL
RSM TECHNOLOGY ACADEMY elearning Syllabus and Agenda LOYALTY MANAGEMENT FOR RETAIL FOR MICROSOFT DYNAMICS AX Course Details 3 Audience 3 Continuing Professional Education 3 Registration and Payment 3 Refund
More informationCloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.
Cloudera Data Science and Machine Learning Robin Harrison, Account Executive David Kemp, Systems Engineer 1 This is the age of machine learning. Data volume NO Machine Learning Machine Learning 1950s 1960s
More informationCloudera, Inc. All rights reserved.
1 Data Analytics 2018 CDSW Teamplay und Governance in der Data Science Entwicklung Thomas Friebel Partner Sales Engineer tfriebel@cloudera.com 2 We believe data can make what is impossible today, possible
More informationHow to build and deploy machine learning projects
How to build and deploy machine learning projects Litan Ilany, Advanced Analytics litan.ilany@intel.com Agenda Introduction Machine Learning: Exploration vs Solution CRISP-DM Flow considerations Other
More informationCourse Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.
Course Content Course Description: The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. At Course Completion: After competing this course,
More informationFrom Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN
From Information to Insight: The Big Value of Big Data Faire Ann Co Marketing Manager, Information Management Software, ASEAN The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT
More informationGPU ACCELERATED BIG DATA ARCHITECTURE
INNOVATION PLATFORM WHITE PAPER 1 Today s enterprise is producing and consuming more data than ever before. Enterprise data storage and processing architectures have struggled to keep up with this exponentially
More information20775A: Performing Data Engineering on Microsoft HD Insight
20775A: Performing Data Engineering on Microsoft HD Insight Duration: 5 days; Instructor-led Implement Spark Streaming Using the DStream API. Develop Big Data Real-Time Processing Solutions with Apache
More informationData Center Operating System (DCOS) IBM Platform Solutions
April 2015 Data Center Operating System (DCOS) IBM Platform Solutions Agenda Market Context DCOS Definitions IBM Platform Overview DCOS Adoption in IBM Spark on EGO EGO-Mesos Integration 2 Market Context
More informationData Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence
Data Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence Data Science, Machine Learning and Artificial Intelligence Deep Learning AREAS OF AI Rule-based
More informationBig Data Foundation. 2 Days Classroom Training PHILIPPINES :: MALAYSIA :: VIETNAM :: SINGAPORE :: INDIA
Big Data Foundation 2 Days Classroom Training PHILIPPINES :: MALAYSIA :: VIETNAM :: SINGAPORE :: INDIA Content Big Data Foundation Course Introduction Who we are Course Overview Career Path Course Content
More informationAutomating Netflix ML Pipelines With Meson. QCon SF 2017 Eugen Cepoi, Davis Shepherd
Automating Netflix ML Pipelines With Meson QCon SF 2017 Eugen Cepoi, Davis Shepherd ? Goal Create a personalized experience to help members find content to watch and enjoy Recommendation Context Online
More informationDeep Fraud. Doing research with financial data
Deep Fraud Doing research with financial data Chapter 1: Problem Statement Each time a new card payment arrives to our mainframe, provide a risk score (from 0 to 99) Apply this score together with some
More information20775 Performing Data Engineering on Microsoft HD Insight
Duración del curso: 5 Días Acerca de este curso The main purpose of the course is to give students the ability plan and implement big data workflows on HD. Perfil de público The primary audience for this
More informationBusiness is being transformed by three trends
Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence
More informationResource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator
Resource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator Renyu Yang Supported by Collaborated 863 and 973 Program Resource Scheduling Problems 2 Challenges at Scale
More information