The Evolution of Big Data

Similar documents
KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

Brian Macdonald Big Data & Analytics Specialist - Oracle

: What are examples of data science jobs?

In-Memory Analytics: Get Faster, Better Insights from Big Data

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

Big Data The Big Story

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Data Informatics. Seon Ho Kim, Ph.D.

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

IBM Analytics Unleash the power of data with Apache Spark

Sunnie Chung. Cleveland State University

DataAdapt Active Insight

DATENBANK TRANSFORMATION WARUM, WOHIN UND WIE? 15. Mai 2018

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

Statistics & Optimization with Big Data

Cognizant BigFrame Fast, Secure Legacy Migration

Building Your Big Data Team

Data IBM. Education for our Data Scientists. Emily Plachy, Distinguished Engineer, IBM Global Chief Data Office May 1, 2017

Oracle Retail Data Model (ORDM) Overview

Bringing the Power of SAS to Hadoop Title

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Roles and Processes in Analytics Development

Spark and Hadoop Perfect Together

Knowledge Discovery and Data Mining

Analytics in Action transforming the way we use and consume information

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

The Power and Peril! of Predictive Analytics

Intro to Big Data and Hadoop

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

The Big Opportunities at the Junction of AI and Analytics: Interview with Tom Davenport

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

Enterprise Database Systems for Big Data, Big Data Processing, and Big Data Analytics. Sunnie Chung Cleveland State University 1

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

BIG DATA AND HADOOP DEVELOPER

5th Annual. Cloudera, Inc. All rights reserved.

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

High-Performance Computing (HPC) Up-close

Managing Data Warehouse Growth in the New Era of Big Data

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

SAP Predictive Analytics Suite

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION

IBM PureData System for Analytics Overview

A complete service guide for MICROSOFT DATA ANALYTICS ENABLEMENT

Big risks require big data thinking: EY Forensic Data Analytics (FDA), powered by IBM

IBM SPSS Modeler Personal

IBM SPSS Modeler Personal

Modern Payment Fraud Prevention at Big Data Scale

Meetup DB2 LUW - Madrid. IBM dashdb. Raquel Cadierno Torre IBM 1 de Julio de IBM Corporation

Realising Value from Data

TDWI Analytics Fundamentals. Course Outline. Module One: Concepts of Analytics

Big Data Introduction

Microsoft Azure Essentials

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Free Cognitive Computing And Big Data Analytics Ebooks Online

T H E B O T T O M L I N E

White Paper. Return on Information: The New ROI. Getting value from data

Hybrid Data Management

Cognitive Solutions in the Context of IBM Systems Cognitive Analytics / Integration Scenarios / Use Cases

WELCOME TO. Cloud Data Services: The Art of the Possible

Changing The Business landscape SAS and Open Source, Better Together. Dr Mark Chia, Head of Advanced Analytics, SAS

The Alpine Data Platform

Garanti Bank s Journey to Big Data Ayşen Büyükakın Business Intelligence & Analytics Unit Manager

Copyright 2013 Oracle and/or its affiliates. All rights reserved.

The ETL Problem Solved: The Compelling Financial Case for Running Analytics on the Mainframe

IBM Briefing Hva med 2017?

Analytics for Banks. September 19, 2017

FINANCE + DEEP LEARNING SKYMIND / DEEPLEARNING4J 2015

CS4491/CS 7265 BIG DATA ANALYTICS

Managing Data to Maximize Smart Grid Benefits

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase

POWER REAL-TIME TELCO NETWORK OPERATIONS WITH EXTREME ANALYTICS

Operationalizing Analytics

St Louis CMG Boris Zibitsker, PhD

TeraCrunch Solution Blade White Paper

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

SmartCare. SPSS Workshop. Rick Durham - North American Advanced Analytics Channel Team IBM Corporation. Date: 5/28/2014

Nicolas AMEYE Contact details +32 (0)

INTRODUCTION TO R FOR DATA SCIENCE WITH R FOR DATA SCIENCE DATA SCIENCE ESSENTIALS INTRODUCTION TO PYTHON FOR DATA SCIENCE. Azure Machine Learning

Big Data: Essential Elements to a Successful Modernization Strategy

How Data Science is Changing the Way Companies Do Business Colin White

Analytic Workloads on Oracle and ParAccel

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

BOOSTING HEALTH CARE PAYER PERFORMANCE WITH ADVANCED ANALYTICS

Monetizing the Lake. Kirk Haslbeck, Hortonworks Dan Kernaghan, Pitney Bowes

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

DOWNTIME IS NOT AN OPTION

Building a Data Lake with Spark and Cassandra Brendon Smith & Mayur Ladwa

ITG STATUS REPORT. Bottom-line Advantages of IBM InfoSphere Warehouse. Overview. May 2011

Hamburg 20 September, 2018

Designing an Analytics Strategy for the 21 st Century

Cognitive Data Warehouse and Analytics

2018 Analytics Challenge Winner Presentation

Goya Deep Learning Inference Platform. Rev. 1.2 November 2018

Oracle 全数据平台解决方案 : 打破技术壁垒, 释放数据能量. Sally Piao 甲骨文公司全球研发副总裁

SAS GRID Assesments from d-wise

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Applying analytics to compliance programs

Transcription:

The Evolution of Big Data Andrew Fast, Ph.D. Chief Scientist fast@elderresearch.com Headquarters 300 W. Main Street, Suite 301 Charlottesville, VA 22903 434.973.7673 fax 434.973.7875 www.elderresearch.com Copyright 2016 Elder Research, Inc. Office Locations Arlington, VA Linthicum, MD Raleigh, NC

The Basics This is a computer. A computer can run advanced algorithms for processing data Advanced algorithms can produce significant value for organizations. 2

Three Key Resources CPU Runs the Algorithms Memory Caches Data Disk Data Storage 3

OK, four 4

Four Vs of Big Data 5

My definition Big Data : data that requires more processing resources to produce value than are available within the constraints of the current business problem. 6

Speed Matters 7 http://blog.infinio.com/relative-speeds-from-ram-to-flash-to-disk

Speed Matters http://blog.infinio.com/relative-speeds-from-ram-to-flash-to-disk 8

A Word about Processors For the purposes of this talk, processors are the algorithms and methods used to process and summarize the data. Statistical Models Query Languages Etc. 9

The Tree of Life The world of data science has evolved. Many claim Big Data is the pinnacle of evolution. Is it? Big Data In Database Analytics Small Data Primordial Data 10

Primordial Data Characterized by data and processing all contained on a single machine. What could possibly go wrong? Well, data grows. 11

Small Data Move storage off the local machine into a database Combines storage in RDBMS with in-core analytics Limited interaction between Data Storage and Analytical Processing Queries Data Relational Databases (Oracle, SQLServer, MySQL) In-core Analytics (SAS, IBM, R, Python, Excel, etc.)

Small Data Strengths: Mature Technologies Defined processing standards Limitations: Requires expensive, proprietary hardware to scale to large and/or high-velocity data Algorithms limited by slow data access Data Volume Either speed or size not both Handles small, slow data exceptionally well Costly and challenging to work here Either speed or size not both Data Velocity

In-Database Analytics Remove data transfer burdens and singlemachine limitations by bringing analytics to the data Database-specific implementations of analytics algorithms Recognition that storage is not the only factor Relational Databases and Data Appliances (SQLServer, PostgreSQL, Oracle ExaData Teradata, Netezza, etc.)

In-Database Analytics Strengths: Improves speed of analytics through reduced data transfer Optimized performance Mature storage technology Limitations: Implementations are database specific Not all analytic algorithms lend themselves to in-database analytics Expensive, proprietary hardware and/or software usually required

What could possibly go wrong? Data grows (sometimes very quickly)! RDBMSs and Data Appliances are still costly to upgrade when more space is needed Vendor lock-in due to proprietary system Innovation happens in bursts, rarely at industry leader 16

Big Data Collection of largely open-source technologies for largescale data storage and processing on commodity hardware Massively parallel integration of storage and processing Analysis of extremely large datasets now possible on commodity hardware Massively Parallel storage and processing on commodity hardware (NoSQL, Hadoop, Hive, Cassandra, etc.)

Limited Big Data It is possible to treat Big Data as only a storage mechanism, not a processing engine Revert back to Small data paradigm Queries Data In-core Analytics (SAS, IBM, R, etc) 18

Big Data Strengths: Designed for large, highvelocity data Decoupling of storage and processing, but still at scale Limitations: New tools required (not based on SQL) Some analytics are slower Data Volume Parallelization overhead leads to costs here Built for large, high velocity datasets Data Velocity

What could possibly go wrong? Most advanced analytics also require multi-variable computations such as correlations or matrix inversions Costly in distributed environment Latency can still be an issue 20

The Missing Link: Big Memory Big Data solves the storage problem using data distribution on commodity hardware Requires Big Algorithms using in-database strategies. All analytical processing must be distributed with the data Now, Big Memory to make it all work fast 21

No More One-Size fits all Successful Big Data software solutions have very specific use cases Tabular Graph Streaming Log Processing GraphX Memory Storage List of tools is not an exhaustive list 22

Four Vs of Big Data 23

Introducing Analytics 3.0 A new era of analytics A new resolve to apply powerful data-gathering and analysis methods not just to a company s operations but also to its offerings Coined by Tom Davenport, Distinguished Professor of IT and Management at Babson College Founder of International Institute for Analytics Author of Competing on Analytics 24

Road to Pervasive Analytics Marketing Marketing Strategic Management Operations Strategic Management Operations Marketing Strategic Operations Specialty Markets Corporate Operations Reinsurance Management Specialty Markets Corporate Operations Reinsurance Claims Underwriting Specialty Markets Corporate Operations Reinsurance Claims Underwriting IT IT Claims Underwriting IT 1.0 Small Data Business Intelligence Reporting 2.0 Big Data Integration of external data sources Visual and Descriptive Analytics 3.0 Advanced Analytics Big and Small Data together Analytics central to strategy

Variety is the Key to Value Big Data systems integrate information from varied sources for deeper/broader understanding Sue Feldman, CEO of Synthexsis (TAW San Francisco, 2013) Not just for Silicon Valley startups anymore Anyone can collect and acquire data to create mash-ups and increase value

Summary Your storage mechanism is the limiter on analytic processing Evolve the storage to evolve the analytics Variety is the big V that drives Value. Seek it out! 27

Andrew Fast Chief Scientist, Elder Research, Inc. Dr. Andrew Fast leads research in Text Mining and Social Network Analysis at Elder Research, the nation s leading data mining consultancy. ERI was founded in 1995 and has offices in Charlottesville VA and Washington DC,(www.datamininglab.com). ERI focuses on Federal, commercial, investment, and security applications of advanced analytics, including stock selection, image recognition, biometrics, process optimization, cross-selling, drug efficacy, credit scoring, risk management, and fraud detection. Dr. Fast graduated Magna Cum Laude from Bethel University and earned Master s and Ph.D. degrees in Computer Science from the University of Massachusetts Amherst. There, his research focused on causal data mining and mining complex relational data such as social networks. At ERI, Andrew leads the development of new tools and algorithms for data and text mining for applications of capabilities assessment, fraud detection, and national security. Dr. Fast has published on an array of applications including detecting securities fraud using the social network among brokers, and understanding the structure of criminal and violent groups. Other publications cover modeling peer-to-peer music file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head coaches (work featured on ESPN.com). With John Elder and other co-authors, Andrew has written a book on Practical Text Mining, that was awarded the prose Award for Computing and Information Science in 2012. 28

About Elder Research 29

Elder Research Highlights Experience Most experienced consultancy in Data Science and Predictive Analytics (Founded in 1995) Industry Leaders Deep experience in predictive analytics, anomaly detection, and workload prioritization Thought Leaders We bridge the gap between academia and industry to bring advanced algorithms and knowledge discovery to clients across industry lines Solution Neutral Experts in applying COTS, open-source, and custom data mining products Software Development Award-winning capability in robust software application development, including commercial deployment of scientific and engineering solutions 30

Areas of Expertise Data Mining and Predictive Analytics Discovering patterns in past data that can be used to predict the outcome of future events including statistical modeling, classification & analysis, clustering, optimization & simulation, and customer segmentation Text Mining Understanding information stored in text documents and databases including document classification, natural language processing, information extraction and search Scientific Software Engineering Aiding clients in data mining, engineering and physical sciences by transforming laboratory-grade software innovations into world-class commercial-grade applications Data Visualization Making advanced algorithms easily accessible through 2-D & 3-D, statistical and spatial visualization Education & Training Data mining workshops and courses (100+); keynote, plenary, and conference talks; university courses 31

Books Written By Elder Research Book written for practitioners by practitioners (May 2009) 2009 Prose Award Winner How to combine models for improved predictions (Feb 2010) Introduction to Text Mining for practitioners (Jan 2012) 2012 Prose Award Winner 32 Guide for executives preparing to lead analytics initiatives (Sept 2016)

Training Services Analytics Concepts Course Describes leading algorithms, compares their merits, and briefly demonstrates their relative effectiveness in practical applications. Analytics Practitioners Course How to discover useful patterns and trends within data. Valuable practical advice on how to build reliable predictive models and interpret results with confidence. Tools Training Course Customized engagement based on the client s requirements. In-depth learning about common analytic software packages. Decision Analytics Professional Course Learn complex data access and manipulation techniques, add new tools to data crunching arsenal, and develop presentation skills. Culminates with a capstone project from the participants own department.