Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.
|
|
- Louise Sutton
- 6 years ago
- Views:
Transcription
1 Apache Spark 2.0 GA The General Engine for Modern Analytic Use Cases 1
2 Apache Spark Drives Business Innovation Apache Spark is driving new business value that is being harnessed by technology forward organizations. Driving Customer Insights Next Best Offer (Machine Learning) Churn Analysis Click-Stream (Stream Processing) Improving Products and Service Efficiencies Streaming from IOT Sources Connected Products/Services Analysis Drive Customer Insights Proactive/Predictive Maintenance Improve Product & Lower Business Risks Service Efficiency Risk Modeling & Analysis Network Threat Detection Lower Business Risk 2
3 Spark Addresses Common Limitations Access and Usability One of the key advantages of Apache Spark is the intuitive and flexible API for big-data processing, available in popular programming languages. Prior to Apache Spark, users had access to very limited in-flexible abstractions for processing large distributed data, with poor support outside java. Data Processing Performance Mapreduce made big strides in enabling cost effective batch processing of large volumes of data. However, businesses continue to see a need to shorten data processing windows and consume data faster, requiring a new framework with significantly better performance. Machine Learning at Scale Data Science and Machine Learning on big-data are exciting areas of focus. However that requires libraries and that enable building models on large distributed data and APIs that allow flexible exploration of data. 3
4 Apache Spark Fast and flexible general purpose data processing for Hadoop Data Engineering Stream Processing Data Science & Machine Learning Unified API and processing Engine for large scale data 4
5 Spark Use Cases Top Use Cases Data Processing (55%), Real-Time Stream Processing (44%), Exploratory Data Science (33%) and Machine Learning (33%). 3 out of 8 are employing Spark in data science research 5
6 Why Spark at Cloudera? The Most Apache Spark Experience BATCH Spark, Hive, Pig MapReduce PROCESS, ANALYZE, SERVE RESOURCE MANAGEMENT YARN FILESYSTEM HDFS STREAM Spark STRUCTURED Sqoop SQL Impala UNIFIED SERVICES RELATIONAL Kudu STORE INTEGRATE SEARCH Solr SECURITY Sentry, RecordService NoSQL HBase UNSTRUCTURED Kafka, Flume SDK Kite Cloudera is the stress free choice for Spark Support: Proactive Support for Spark workloads Expertise: Most Spark users trained. Robust development community. Experience: First to ship and support. Most customers running Spark of any commercial Hadoop Distribution. Cloudera lives where your data lives Run Spark On-prem or in the Public Cloud Cloudera makes Spark enterprise hardened Comprehensive Management and Alerting End to End Security and Governance Better Multi-tenancy operation for multiple workloads Out-of-the-box ready for end to end use cases Spark with supported seamless integrations with other big-data tools (Kafka, Hbase, Kudu, etc) 6
7 Spark from Cloudera 57% have adopted Cloudera Spark for their most important use case, vs. 26% Hortonworks, 22% an Apache download, and 7% Databricks 48% of respondents said they most commonly use Spark with HBase and 41% of respondents said they use Spark with Kafka **Source: Tejena Group Apache Spark Market Survey
8 The One Platform Initiative Management Leverage Hadoop-native resource management Security Full support for Hadoop security and beyond Scale Spark at Petabyte scale Streaming Performance, simplification & easymanagement of streaming workloads Cloud Elastic transient workloads 8
9 OPERATIONS DATA MANAGEMENT Three Core Enterprise Applications Data Engineering & Science Process data, develop & serve predictive models Analytic Database ELT, reporting, exploratory business intelligence Operational Database Build data-driven applications to deliver real-time insights PROCESS, ANALYZE, SERVE UNIFIED SERVICES STORE INTEGRATE 9
10 Cloudera s Data Engineering Solution Data Science Workbench Coming Soon Collaborative and Secure Data Science Workbench Navigator Audit, lineage, encryption, key management, & policy lifecycles Search Interactive search and immediate exploration Cloud Deployment Easy deployment and flexible scaling Hive-on- Spark Spark Large-scale ETL & batch processing engine Modern Real-time Analytics Engine Multi-Storage, Cloudera, Inc. Multi-Environment All rights reserved. 10
11 Data Processing 11
12 Common Limitations Poor Cloud Design ETL and Batch Processing workloads need to utilize large amounts of compute but for only a window of time. This causes organizations to over provision to meet demands of the job while the environment lays dormant a majority of the time producing poor ROI. Poor Performance ETL and data processing takes too long and often excludes important data sources that are needed to extract real value from data collected. Traditional platforms only leverage structured data but increasingly the data needed to offer true intelligence varies in format and delivery. Limited Data Formats Traditional platforms only leverage structured data and require a strategic approach to schema design. Introducing new data (unstructured, time series, nested, log data) is often complex if not impossible This causes analysis to be limited to only data extracted from core systems. 12
13 Data Processing with Spark Process large scale unstructured and structured data in the same application Powerful and flexible higher order functions for arbitrary processing of structured or unstructured data map flatmap filter union reducebykey groupby distinct intersection cartesian cogroup sortbykey aggregatebykey repartition partitionby coalesce pipe partitionby mapwith countbykey foreach... Keeping it simple: SQL for common operations on structured data Optimized execution by query processing engine Seamlessly mix SQL and higher-order functions Within the same Scala, Java or Python Spark application 13
14 Machine Learning 14
15 Machine Learning In A Recent MIT Study, Respondents evaluated use cases for machine learning 76% used machine learning to target higher sales growth 40% used them to improve sales and marketing performance 10% used machine learning to increase product sales and reduce churn. Enterprises are using machine learning to better serve their customers with higher relevance. Machine Learning models need to scale and that is where the power of Cloudera Enterprise excels. ** Source: Forbes Online Machine Learning Is Redefining The Enterprise In
16 Apache Spark MLlib Collection of mainstream machine learning algorithms built on Spark Including: Classifiers: logistic regression, boosted trees, random forests, etc Clustering: k-means, Latent Dirichlet Allocation (LDA) Recommender Systems: Alternating Least Squares Dimensionality Reduction: Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) Feature Engineering & Selection: TF-IDF, Word2Vec, Normalizer, etc Statistical Functions: Chi-Squared Test, Pearson Correlation, etc 16
17 Real Time Analysis 17
18 Spark Streaming Real-time and continuous processing of data streams Fault-tolerant and high-performance processing of continuous streams of data High throughput with sub-second latency Similar API and programming paradigm for batch and stream processing Express complex processing logic on data streams Focus on the processing logic, instead of stream topology Re-use code across batch and streaming jobs Simplified APIs for common streaming tasks: Operations on Rolling Windows Maintain and update arbitrary state for streaming events Incremental aggregations Combine with MLlib for Predictive Analytics on streaming data 18
19 Spark Adoption 64% of current adopters plan to increase Apache Spark usage over the next 12 months Spark deployment in public cloud is projected to increase from 23% today to 36% in the future 19
20 Spark in the Cloud 20
21 Why Cloudera for Spark in the Cloud? Rely on the most portable, cost-effective, cloud-ready data platform Flexible Deployment No vendor lock-in Multi-cloud and on-prem Transient and longrunning clusters Flexible cluster topologies Flexible Pricing Pay-as-you-go cloud usage Traditional node-based licensing Spot instance support Grow/shrink clusters Integrated Data Platform Build end-to-end data apps Ingest, process, explore, model, analyze, serve Common security, governance, metadata, management Cloud-Native Direct Spark I/O from S3 Data/metadata persistence across cluster lifecycles Fast self-service clusters Single pane of glass for multicluster view 21
22 Data Engineering and Data Science Two Common Workload Patterns Batch Processing / ETL (also: Testing Environments) Only pay for what you need, when you need it Exploratory Data Science (also: Development Environments) Explore and analyze all data, wherever it lives, on demand Transient clusters Single user Sized to demand Object storage centric Cloud-native deployment Transient or persistent Single or multi-user Elastic workload HDFS or object storage Lift-and-shift or cloud-native deployment 22
23 Spark in the Cloud Sample Architecture Kafka + Spark Streaming on permanent clusters, for streaming data ingest and processing Spark batch jobs on transient clusters, for processing or machine learning, directly read/write to the object store Interactive Spark or Impala for exploratory data science on permanent or transient clusters, directly read/write to the object store Serving tier (e.g. HBase, Search) on permanent clusters, serving data to end applications HBase, Search, Model Server, etc. Object Store 23
24 Spark 2.0 What s New? 24
25 New unified API: Dataset API Datasets RDDs Object Oriented Functional Operators map, reducebykey, cogroup, etc Compile-time Type Safety Dataframes Structured Compact binary representation Query Optimizer Sort/shuffle without deserialization 25
26 Continued Innovation: Structured Streaming Spark Streaming 2.0 Streams modeled as continuous Dataframes SQL like syntax to author stream processing Open stream processing to a wider audience With a wide array of in-built aggregation and statistical functions Easier end-to-end exactly once semantics Out-Of-Order data handling Increased performance Growing array of Streaming ML functionality 26
27 Continued Innovation: Machine Learning Persistence Save and Load Models Save and Load Pipelines Bag of words Tokenize TF-IDF LDA Scale & Normalize Features Train Classifier *Sequence is repeated during Training and Scoring **Hyper-Parameter Tuning Repeat Sequence with different parameter values 27
28 How do I get Spark 2.0? Download our parcel at s/spark2/2-0.html Read more at che-spark beta-now-available-for-cdh 28
29 Recommended Training for Spark Users Apache Spark Developer Training Cloudera University s three-day Spark course enables participants to build complete, unified big data applications. Data Science at Scale with Spark and Hadoop Spark and Hadoop are transforming how data scientists work by allowing interactive and iterative data analysis at scale. Introduction to Machine Learning The course provides an introduction to Machine Learning, including coverage of collaborative filtering, clustering, classification, algorithms, and 29 data volume.
30 Thank You 30
Transforming Analytics with Cloudera Data Science WorkBench
Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s
More informationIntroduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation
Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop
More informationArchitecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.
Architecture Optimization for the new Data Warehouse Guido Oswald - @GuidoOswald 1 Use Cases This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently
More informationEXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains
More informationMicrosoft Azure Essentials
Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,
More informationTaking Advantage of Cloud Elasticity and Flexibility
Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1 Public cloud adoption is surging 2 Cloudera customers are leading the way 3 Hadoop was born for the
More information5th Annual. Cloudera, Inc. All rights reserved.
5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software
More informationSpark and Hadoop Perfect Together
Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users Data Operating System
More informationBIG DATA AND HADOOP DEVELOPER
BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1
More informationSr. Sergio Rodríguez de Guzmán CTO PUE
PRODUCT LATEST NEWS Sr. Sergio Rodríguez de Guzmán CTO PUE www.pue.es Hadoop & Why Cloudera Sergio Rodríguez Systems Engineer sergio@pue.es 3 Industry-Leading Consulting and Training PUE is the first Spanish
More informationHadoop in the Cloud. Ryan Lippert, Cloudera Product Cloudera, Inc. All rights reserved.
Hadoop in the Cloud Ryan Lippert, Cloudera Product Marketing @lippertryan 1 2 Cloudera Confidential 3 Drive Customer Insights Improve Product & Services Efficiency Lower Business Risk 4 The world s largest
More informationBig data is hard. Top 3 Challenges To Adopting Big Data
Big data is hard Top 3 Challenges To Adopting Big Data Traditionally, analytics have been over pre-defined structures Data characteristics: Sales Questions answered with BI and visualizations: Customer
More informationHortonworks Connected Data Platforms
Hortonworks Connected Data Platforms MASTER THE VALUE OF DATA EVERY BUSINESS IS A DATA BUSINESS EMBRACE AN OPEN APPROACH 2 Hortonworks Inc. 2011 2016. All Rights Reserved Data Drives the Connected Car
More informationDatabricks Cloud. A Primer
Databricks Cloud A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to
More informationCask Data Application Platform (CDAP)
Cask Data Application Platform (CDAP) CDAP is an open source, Apache 2.0 licensed, distributed, application framework for delivering Hadoop solutions. It integrates and abstracts the underlying Hadoop
More informationHadoop Course Content
Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers
More informationCask Data Application Platform (CDAP) Extensions
Cask Data Application Platform (CDAP) Extensions CDAP Extensions provide additional capabilities and user interfaces to CDAP. They are use-case specific applications designed to solve common and critical
More information20775 Performing Data Engineering on Microsoft HD Insight
Duración del curso: 5 Días Acerca de este curso The main purpose of the course is to give students the ability plan and implement big data workflows on HD. Perfil de público The primary audience for this
More information20775: Performing Data Engineering on Microsoft HD Insight
Let s Reach For Excellence! TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC Address: 103 Pasteur, Dist.1, HCMC Tel: 08 38245819; 38239761 Email: traincert@tdt-tanduc.com Website: www.tdt-tanduc.com; www.tanducits.com
More informationCourse Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.
Course Content Course Description: The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. At Course Completion: After competing this course,
More information20775A: Performing Data Engineering on Microsoft HD Insight
20775A: Performing Data Engineering on Microsoft HD Insight Duration: 5 days; Instructor-led Implement Spark Streaming Using the DStream API. Develop Big Data Real-Time Processing Solutions with Apache
More informationMapR: Solution for Customer Production Success
2015 MapR Technologies 2015 MapR Technologies 1 MapR: Solution for Customer Production Success Big Data High Growth 700+ Customers Cloud Leaders Riding the Wave with Hadoop The Big Data Platform of Choice
More informationLeveraging Predictive Tools to Decrease Resolution Time
Leveraging Predictive Tools to Decrease Resolution Time Angus Klein Vice President, Global Support Adam Warrington Director, Engineering 1 The Value of Hadoop One place for unlimited data All types More
More information20775A: Performing Data Engineering on Microsoft HD Insight
20775A: Performing Data Engineering on Microsoft HD Insight Course Details Course Code: Duration: Notes: 20775A 5 days This course syllabus should be used to determine whether the course is appropriate
More informationBIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW
BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade
More informationOutline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.
Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration
More informationHortonworks Data Platform
Hortonworks Data Platform An open-architecture platform to manage data in motion and at rest Highlights Addresses a range of data-at-rest use cases Powers real-time customer applications Delivers robust
More informationIBM SPSS & Apache Spark
IBM SPSS & Apache Spark Making Big Data analytics easier and more accessible ramiro.rego@es.ibm.com @foreswearer 1 2016 IBM Corporation Modeler y Spark. Integration Infrastructure overview Spark, Hadoop
More informationC3 Products + Services Overview
C3 Products + Services Overview AI CLOUD PREDICTIVE ANALYTICS IoT Table of Contents C3 is a Computer Software Company 1 C3 PaaS Products 3 C3 SaaS Products 5 C3 Product Trials 6 C3 Center of Excellence
More informationMake Business Intelligence Work on Big Data
Make Business Intelligence Work on Big Data Speed. Scale. Simplicity. Put the Power of Big Data in the Hands of Business Users Connect your BI tools directly to your big data without compromising scale,
More informationPreface About the Book
Preface About the Book We are living in the dawn of what has been termed as the "Fourth Industrial Revolution" by the World Economic Forum (WEF) in 2016. The Fourth Industrial Revolution is marked through
More informationBusiness is being transformed by three trends
Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence
More informationInsights-Driven Operations with SAP HANA and Cloudera Enterprise
Insights-Driven Operations with SAP HANA and Cloudera Enterprise Unleash your business with pervasive Big Data Analytics with SAP HANA and Cloudera Enterprise The missing link to operations As big data
More informationModernizing Your Data Warehouse with Azure
Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the
More informationCloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.
Cloudera Data Science and Machine Learning Robin Harrison, Account Executive David Kemp, Systems Engineer 1 This is the age of machine learning. Data volume NO Machine Learning Machine Learning 1950s 1960s
More informationSimplifying the Process of Uploading and Extracting Data from Apache Hadoop
Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Rohit Bakhshi, Solution Architect, Hortonworks Jim Walker, Director Product Marketing, Talend Page 1 About Us Rohit Bakhshi Solution
More informationApache Hadoop in the Datacenter and Cloud
Apache Hadoop in the Datacenter and Cloud The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational
More informationCloud Based Analytics for SAP
Cloud Based Analytics for SAP Gary Patterson, Global Lead for Big Data About Virtustream A Dell Technologies Business 2,300+ employees 20+ data centers Major operations in 10 countries One of the fastest
More informationSpotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1
Spotlight Sessions Nik Rouda Director of Product Marketing Cloudera @nrouda Cloudera, Inc. All rights reserved. 1 Spotlight: Protecting Your Data Nik Rouda Product Marketing Cloudera, Inc. All rights reserved.
More informationData Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB
Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data
More informationAccelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica
Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud
More informationSimplifying Data Engineering to Accelerate Innovation
Simplifying Data Engineering to Accelerate Innovation The Rise of Data Engineering With the continued growth in data generated and captured by companies across industries, the market for big data analytics
More informationOracle Big Data Cloud Service
Oracle Big Data Cloud Service Delivering Hadoop, Spark and Data Science with Oracle Security and Cloud Simplicity Oracle Big Data Cloud Service is an automated service that provides a highpowered environment
More informationAmsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect
(technical) Updates & demonstration Robert Voermans Governance architect Amsterdam Please note IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
More informationBig Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase
BIG DATA COURSE Big Data Application Engineer/ Developer Specialization in Apache Spark, Kafka, Airflow, HBase In Exclusive Association with 21,347+ Participants 10,000+ Brands 1200+ Trainings 45+ Countries
More informationSOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform
SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform CREATE STREAMING ANALYTICS APPLICATIONS IN MINUTES WITHOUT WRITING CODE The increasing growth
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationManaging explosion of data. Cloudera, Inc. All rights reserved.
Managing explosion of data 1 Customer experience expectations are converging on the brand, not channel Consistent across all channels and lines of business Contextualized to present location and circumstances
More informationBringing the Power of SAS to Hadoop Title
WHITE PAPER Bringing the Power of SAS to Hadoop Title Combine SAS World-Class Analytics With Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities ii Contents Introduction... 1 What
More informationRedefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer
Redefine Big Data: EMC Data Lake in Action Andrea Prosperi Systems Engineer 1 Agenda Data Analytics Today Big data Hadoop & HDFS Different types of analytics Data lakes EMC Solutions for Data Lakes 2 The
More informationSAS & HADOOP ANALYTICS ON BIG DATA
SAS & HADOOP ANALYTICS ON BIG DATA WHY HADOOP? OPEN SOURCE MASSIVE SCALE FAST PROCESSING COMMODITY COMPUTING DATA REDUNDANCY DISTRIBUTED WHY HADOOP? Hadoop will soon become a replacement complement to:
More informationCloudera Hadoop & Industrie 4.0 wohin mit dem Datenstrom?
Cloudera Hadoop & Industrie 4.0 wohin mit dem Datenstrom? Bernard Doering Regional Sales Director, Central Europe 1 Cloudera Hadoop Scalable Flexible Open Cost- EffecLve 2 2014 Cloudera, Inc. All rights
More informationSOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform
SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform CREATE STREAMING ANALYTICS APPLICATIONS IN MINUTES WITHOUT WRITING CODE The increasing growth of data, especially data-in-motion,
More informationWho is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.
Databricks Primer Who is Databricks? Databricks was founded by the team who created Apache Spark, the most active open source project in the big data ecosystem today, and is the largest contributor to
More informationExploring Big Data and Data Analytics with Hadoop and IDOL. Brochure. You are experiencing transformational changes in the computing arena.
Brochure Software Education Exploring Big Data and Data Analytics with Hadoop and IDOL You are experiencing transformational changes in the computing arena. Brochure Exploring Big Data and Data Analytics
More informationAnalytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand
Paper 2698-2018 Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand ABSTRACT Digital analytics is no longer just about tracking the number
More informationHadoop and Analytics at CERN IT CERN IT-DB
Hadoop and Analytics at CERN IT CERN IT-DB 1 Hadoop Use cases Parallel processing of large amounts of data Perform analytics on a large scale Dealing with complex data: structured, semi-structured, unstructured
More informationGUIDE The Enterprise Buyer s Guide to Public Cloud Computing
GUIDE The Enterprise Buyer s Guide to Public Cloud Computing cloudcheckr.com Enterprise Buyer s Guide 1 When assessing enterprise compute options on Amazon and Azure, it pays dividends to research the
More informationInsights to HDInsight
Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive
More informationBerkeley Data Analytics Stack (BDAS) Overview
Berkeley Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY What is Big used For? Reports, e.g., - Track business processes, transactions Diagnosis, e.g., - Why is user engagement dropping?
More informationE-guide Hadoop Big Data Platforms Buyer s Guide part 1
Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors
More informationIBM Analytics Unleash the power of data with Apache Spark
IBM Analytics Unleash the power of data with Apache Spark Agility, speed and simplicity define the analytics operating system of the future 1 2 3 4 Use Spark to create value from data-driven insights Lower
More informationCommon Customer Use Cases in FSI
Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine
More informationORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition delivers high-performance data movement and transformation among enterprise platforms with its open and integrated E-LT
More informationEXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper
Sponsored by Successful Data Warehouse Approaches to Meet Today s Analytics Demands EXECUTIVE BRIEF In this Paper Organizations are adopting increasingly sophisticated analytics methods Analytics usage
More informationTHE CIO GUIDE TO BIG DATA ARCHIVING. How to pick the right product?
THE CIO GUIDE TO BIG DATA ARCHIVING How to pick the right product? The landscape of enterprise data is changing with the advent of enterprise social data, IoT, logs and click-streams. The data is too big,
More informationC3 IoT: Products + Services Overview
C3 IoT: Products + Services Overview BIG DATA PREDICTIVE ANALYTICS IoT Table of Contents C3 IoT is a Computer Software Company 1 C3 IoT PaaS Products 3 C3 IoT SaaS Products 5 C3 IoT Product Trials 6 C3
More informationArchitecture Overview for Data Analytics Deployments
Architecture Overview for Data Analytics Deployments Mahmoud Ghanem Sr. Systems Engineer GLOBAL SPONSORS Agenda The Big Picture Top Use Cases for Data Analytics Modern Architecture Concepts for Data Analytics
More informationCask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications
Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications Copyright 2015 Cask Data, Inc. All Rights Reserved. February
More informationHybrid Data Management
Kelly Schlamb Executive IT Specialist, Worldwide Analytics Platform Enablement and Technical Sales (kschlamb@ca.ibm.com, @KSchlamb) Hybrid Data Management IBM Analytics Summit 2017 November 8, 2017 5 Essential
More informationDatametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud
Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise
More informationKnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE
FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK Are you drowning in Big Data? Do you lack access to your data? Are you having a hard time managing Big Data processing requirements?
More informationConfidential
June 2017 1. Is your EDW becoming too expensive to maintain because of hardware upgrades and increasing data volumes? 2. Is your EDW becoming a monolith, which is too slow to adapt to business s analytical
More informationTechArch Day Digital Decoupling. Oscar Renalias. Accenture
TechArch Day 2018 Digital Decoupling Oscar Renalias Accenture !"##$ oscar.renalias@acenture.com @oscarrenalias https://www.linkedin.com/in/oscarrenalias/ https://github.com/accenture THE ERA OF THE BIG
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationPowered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS
Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS www.upxacademy.com 1800-123-1260 About us UpX Academy is an ed-tech platform providing advanced professional training in Big Data Analytics
More informationPowered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS
Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS www.upxacademy.com 1800-123-1260 About us UpX Academy is an ed-tech platform providing advanced professional training in Big Data Analytics
More informationData Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC
Data Analytics Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC Last 15 years IT-centric Traditional Analytics Traditional Applications Rigid Infrastructure Internet Next
More informationSAP Predictive Analytics Suite
SAP Predictive Analytics Suite Tania Pérez Asensio Where is the Evolution of Business Analytics Heading? Organizations Are Maturing Their Approaches to Solving Business Problems Reactive Wait until a problem
More informationHow In-Memory Computing can Maximize the Performance of Modern Payments
How In-Memory Computing can Maximize the Performance of Modern Payments 2018 The mobile payments market is expected to grow to over a trillion dollars by 2019 How can in-memory computing maximize the performance
More informationREDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.
1 REDEFINE BIG DATA Zvi Brunner CTO 2 2020: A NEW DIGITAL WORLD 30B DEVICES 7B PEOPLE Millions OF NEW BUSINESSES Source: Gartner Group, 2014 DIGITIZATION IS ALREADY BEGINNING PRECISION FARMING DRESS THAT
More informationINDUSTRY BRIEF THE ENTERPRISE DATA HUB IN FINANCIAL SERVICES: THREE CUSTOMER CASE STUDIES
INDUSTRY BRIEF THE ENTERPRISE DATA HUB IN FINANCIAL SERVICES: THREE CUSTOMER CASE STUDIES The Enterprise Data Hub in Financial Services: Three Customer Case Studies CLOUDERA INDUSTRY BRIEF 2 Table of Contents
More informationLeveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden
Leveraging Oracle Big Data Discovery to Master CERN s Data Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden Manuel Martin Marquez Intel IoT Ignition Lab Cloud and
More informationCloudera, Inc. All rights reserved.
1 Data Analytics 2018 CDSW Teamplay und Governance in der Data Science Entwicklung Thomas Friebel Partner Sales Engineer tfriebel@cloudera.com 2 We believe data can make what is impossible today, possible
More informationOperational Hadoop and the Lambda Architecture for Streaming Data
Operational Hadoop and the Lambda Architecture for Streaming Data 2015 MapR Technologies 2015 MapR Technologies 1 Topics From Batch to Operational Workloads on Hadoop Streaming Data Environments The Lambda
More informationE-guide Hadoop Big Data Platforms Buyer s Guide part 3
Big Data Platforms Buyer s Guide part 3 Your expert guide to big platforms enterprise MapReduce cloud-based Abie Reifer, DecisionWorx The Amazon Elastic MapReduce Web service offers a managed framework
More informationAnalytics Platform System
Analytics Platform System Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com Ofc 425-538-0044, Cell 303-324-2860 Sean Mikha, DW & Big Data Architect semikha@microsoft.com
More informationBIG DATA and DATA SCIENCE
Integrated Program In BIG DATA and DATA SCIENCE CONTINUING STUDIES Table of Contents About the Course...03 Key Features of Integrated Program in Big Data and Data Science...04 Learning Path...05 Key Learning
More informationPowered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS
Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS www.upxacademy.com 1800-123-1260 About us UpX Academy is an ed-tech platform providing advanced professional training in Big Data Analytics
More informationGPU ACCELERATED BIG DATA ARCHITECTURE
INNOVATION PLATFORM WHITE PAPER 1 Today s enterprise is producing and consuming more data than ever before. Enterprise data storage and processing architectures have struggled to keep up with this exponentially
More informationMeta-Managed Data Exploration Framework and Architecture
Meta-Managed Data Exploration Framework and Architecture CONTENTS Executive Summary Meta-Managed Data Exploration Framework Meta-Managed Data Exploration Architecture Data Exploration Process: Modules
More informationActive Analytics Overview
Active Analytics Overview The Fourth Industrial Revolution is predicated on data. Success depends on recognizing data as the most valuable corporate asset. From smart cities to autonomous vehicles, logistics
More informationRealising Value from Data
Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation
More informationAzure Offerings for Big data. In Kee Paek Cloud Data Solution Architect Microsoft Korea October. 2016
Azure Offerings for Big data In Kee Paek Cloud Data Solution Architect Microsoft Korea October. 2016 Agenda 1. Integrated Big data Platform - Cortana Intelligent Suite 2. Scalable Machine Learning - R
More informationTechValidate Survey Report. Converged Data Platform Key to Competitive Advantage
TechValidate Survey Report Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage Executive Summary What Industry Analysts
More informationWhat s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.
What s New Bernd Wiswedel KNIME 2018 KNIME AG. All Rights Reserved. What this session is about Presenting (and demo ing) enhancements added in the last year By the team Questions? See us at the booth.
More informationABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.
ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed
More informationNew Big Data Solutions and Opportunities for DB Workloads
New Big Data Solutions and Opportunities for DB Workloads Hadoop and Spark Ecosystem for Data Analytics, Experience and Outlook Luca Canali, IT-DB Hadoop and Spark Service WLCG, GDB meeting CERN, September
More informationReal-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale
Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale
More informationMapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia
MapR: Converged Data Pla3orm and Quick Start Solu;ons Robin Fong Regional Director South East Asia Who is MapR? MapR is the creator of the top ranked Hadoop NoSQL SQL-on-Hadoop Real Database time streaming
More information