Big Data in the Cloud High Velocity Data in Media and Games Applications

Size: px
Start display at page:

Download "Big Data in the Cloud High Velocity Data in Media and Games Applications"

Transcription

1 Big Data in the Cloud High Velocity Data in Media and Games Applications George Whiffen, Technical Instructor, Web Services 4pm Loughborough University 28 th Feb Web Services, Inc. and its affiliates. All rights reserved. 1

2 Synopsis What is Big Data? Big Data and the Cloud The Big Data Pipeline High Velocity Data Ingress Storage and Processing Latest Developments IoT, Deep Learning What next? You tell us! 2016 Web Services, Inc. and its affiliates. All rights reserved. 2

3 Overview: Big Data What is big data? The collection and analysis of large amounts of data to answer questions and create competitive advantages Web Services, Inc. and its affiliates. All rights reserved. 3

4 Overview: Big Data When does data become big data? When data sets become so large that you have issues collecting, storing, organizing, analyzing, moving, and sharing them. The velocity, volume, and variety of data outgrows your ability to process it Web Services, Inc. and its affiliates. All rights reserved. 4

5 Big data use cases by industry Ability to effectively analyze big data from multiple sources adds value across sectors Media, advertising Oil and Gas Retail Consumer health Security Social media Gaming Targeted advertising Image and video processing Gas meters Pipeline sensors Recommendations Transaction analysis Bio-sensors Clinical analytics Anti-virus Fraud detection Image recognition Demographics Usage analysis In-game metrics 2016 Web Services, Inc. and its affiliates. All rights reserved. 5

6 CCA 1.02: Leveraging Cloud Computing Part 2: Auto Scaling How Data Volume Helped Create the Cloud November Traffic to.com Provisioned capacity 76% November 24% 2017 Web Services, Inc. or its affiliates. All rights reserved.

7 Six Advantages & Benefits of Cloud Computing Trade capital expense for variable expense. Increase speed and agility. Benefit from massive economies of scale. Stop spending money on running and maintaining data centers. Stop guessing capacity. Go global in minutes Web Services, Inc. and its affiliates. All rights reserved. 7

8 CCA 1.01: What is Cloud Computing Part 3: Infrastructure Global Network - Regions and Availability Zones IRELAND OHIO MONTREAL FRANKFURT BEIJING OREGON N. CALIFORNIA UK PARIS (coming soon) N. VIRGINA NINGXIA (coming soon) SEOU L TOKY O GOVCLOUD INDIA SÃO PAULO SINGAPORE SYDNEY 2017 Web Services, Inc. or its affiliates. All rights reserved.

9 Netflix Scaling Real Time Video & Analytics 2016 Web Services, Inc. and its affiliates. All rights reserved. 9

10 CCA 1.01: What is Cloud Computing Part 3: Infrastructure Foundation Services Compute Network Storage Security & Identity Applications EC2 Lambda VPC Route 53 S3 CloudFront Identity and Access Management KMS WorkDocs WorkSpaces EC2 Container Service Direct Connect Glacier Directory Service WorkMail Elastic Load Balancing Elastic Beanstalk Elastic File System Storage Gateway Cloud HSM WAF Auto Scaling Import/Export Snowball 2017 Web Services, Inc. or its affiliates. All rights reserved.

11 CCA 1.01: What is Cloud Computing Part 3: Infrastructure Platform Services Databases Analytics App Services Management Tools Developer Tools Mobile Services Internet of Things RDS DynamoDB EMR Data Pipeline SES AppStream CloudFormation Config CodeCommit CodeDeploy Cognito Device Farm IoT ElastiCache Redshift Elasticsearch Service Machine Learning SWF Elastic Transcoder CloudTrail Service Catalog CodePipeline SNS Mobile Analytics Database Migration Service Kinesis CloudSearch SQS OpsWorks CloudWatch Mobile Hub API Gateway Trusted Advisor Certificate Manager 2017 Web Services, Inc. or its affiliates. All rights reserved.

12 Big data architectural principles: Using the right tool Consider these four things when determining the right tool for the job: Structure of your data Latency 2016 Web Services, Inc. and its affiliates. All rights reserved. Throughput Access patterns 12

13 The Big Data "Pipeline" Data Collect Process & Analyze Visualize Insight Store Time-to-answer (latency) - Balance of throughput and cost 2016 Web Services, Inc. and its affiliates. All rights reserved. 13

14 MLBA Baseball Ingress of Real Time Data 2016 Web Services, Inc. and its affiliates. All rights reserved. 14

15 Case study: MLBAM MLBAM Sought a platform capable of scaling up and down quickly to handle data on game plays. MLBAM chose to ingest, analyze, and store 17+ petabytes of baseball data per season It takes less than 12 seconds to capture, analyze, and deliver data to broadcasters for on-air analysis. Statcast scales to handle up to 13 games per day, or just one or two daily, and shuts down in the off season. helps MLBAM deliver new data in new ways to attract more fans. MLBAM is the interactive media and Internet company of Major League Baseball. By using to power Statcast we re ensuring that our sport is relevant, important, and the center of life for the next generation of fans. Joe Inzerillo Executive Vice President and CTO MLB Advanced Media 2016 Web Services, Inc. and its affiliates. All rights reserved. 15

16 2016 Web Services, Inc. and its affiliates. All rights reserved. 16

17 Where do solutions map to the pipeline? Collect Store Process & Analyze Visualize Real-time Kinesis Firehose Data Import Import/Export Snowball Object Storage S3 Glacier Real-time Kinesis Streams Hadoop Ecosystem EMR Real-time Lambda Kinesis Analytics Business Intelligence & Data Visualization QuickSight Elastic Search Analytics ElasticSearch Message Queuing SQA RDBMS RDS Data Warehousing Redshift Web/app Servers EC2 NoSQL DynamoDB Machine Learning Machine Learning Search CloudSearch Elastic Search Analytics ElasticSearch IoT IoT Process & Move Data Data Pipeline 2016 Web Services, Inc. and its affiliates. All rights reserved. 17

18 Big Data Ingress Physical Transport 2016 Web Services, Inc. and its affiliates. All rights reserved. 18

19 Big Data Ingress Physical Transport - SUPERSIZED 2016 Web Services, Inc. and its affiliates. All rights reserved. 19

20 Snowmobile Web Services, Inc. and its affiliates. All rights reserved. 20

21 Putting it all together Multiple access Kinesis S3 Connector Data Insight Kinesis EMR DynamoDB S3 Redshift QuickSight Storage Processing Storage Access 2016 Web Services, Inc. and its affiliates. All rights reserved. 21

22 Where do solutions map to the pipeline? Collect Store Process & Analyze Visualize Real-time Kinesis Firehose Data Import Import/Export Snowball Object Storage S3 Glacier Real-time Kinesis Streams Hadoop Ecosystem EMR Real-time Lambda Kinesis Analytics Business Intelligence & Data Visualization QuickSight Elastic Search Analytics ElasticSearch Message Queuing SQA RDBMS RDS Data Warehousing Redshift Web/app Servers EC2 NoSQL DynamoDB Machine Learning Machine Learning Search CloudSearch Elastic Search Analytics ElasticSearch IoT IoT Process & Move Data Data Pipeline 2016 Web Services, Inc. and its affiliates. All rights reserved. 22

23 Low Latency Big Data Scaling for Media AdTech We spend more on snacks than we do on DynamoDB. Valentino Volonghi CTO, Adroll AdRoll, an online advertising platform, serves 50 billion impressions a day worldwide with its global retargeting platforms. Adroll Uses to grow by more than 15,000% in a year Needed high-performance, flexible platform to swiftly sync data for worldwide audience Processes 50 TB of data a day Serves 50 billion impressions a day Stores 1.5 PB of data Worldwide deployment minimizes latency 2016 Web Services, Inc. and its affiliates. All rights reserved. 23

24 Low Latency Big Data in Education Data Driven Education Pittsburg Canergie Mellon University different language courses 18 million users 6 billion exercises per month 31 billion items in a single DynamoDB table 2016 Web Services, Inc. and its affiliates. All rights reserved. 24

25 IoT Features 2016 Web Services, Inc. and its affiliates. All rights reserved. 25

26 High Volume Processing in Genomics 90% of all DNA Sequencing world wide Single person s genomic data = 100Gb Global 24hr turnaround Goal to cut compuation x100 ($1000 to $10) IOT 270 billion data points per year Sequencing is eating the world! (Alex Dickinson) 2016 Web Services, Inc. and its affiliates. All rights reserved. 26

27 Big Data on Immediate Availability. Deploy instantly. There is no hardware to procure and no infrastructure to maintain and scale Broad and Deep Capabilities. Over 50 services and 100s of features to support virtually any big data application and workload Trusted and Secure. Services are designed to meet the strictest requirements and are continuously audited, including for certifications such as ISO 27001, FedRAMP, DoD CSM, and PCI DSS. Hundreds of Partners and Solutions. Get help from a consulting partner or choose from hundreds of tools and applications across the entire data management stack Web Services, Inc. and its affiliates. All rights reserved. 27

28 2016 Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Web Services, Inc. Commercial copying, lending, or selling is prohibited. Errors or corrections? us at Other questions? Contact us at All trademarks are the property of their owners Web Services, Inc. and its affiliates. All rights reserved. 34