Analysis and Modeling of Time-Correlated Failures in Large-Scale Distributed Systems
|
|
- Leon Dalton
- 6 years ago
- Views:
Transcription
1 Analysis and Modeling of Time-Correlated Failures in Large-Scale Distributed Systems Nezih Yigitbasi 1, Matthieu Gallet 2, Derrick Kondo 3, Alexandru Iosup 1, Dick Epema TUDelft, 2 École Normale Supérieure de Lyon, 3 INRIA The Failure Trace Archive Delft University of Technology Challenge the future
2 Failures Do Happen Build a computing system with 10 thousand servers with MTBF of 30 years each, watch one fail per day Jeff Dean, Google Fellow, LADIS 09 Keynote Average worker deaths per MapReduce job is 1.2 MapReduce, OSDI % failures in TeraGrid Khalili et al., GRID 06 During the month of March 2005 on one dedicated cluster with 1500 Xeon CPUs, there were 32,580 Sawzall jobs launched, using an average of 220 machines each. While running those jobs, 18,636 failures occurred (application failure, network outage, system crash, etc.) that triggered rerunning some portion of the job... Rob Pike et al., Google 2
3 Are Failures Independent? Common assumption Is this realistic for large-scale distributed systems? Already know that space correlations exist Time correlations may impact Proactive fault-tolerance solutions Design decisions Checkpointing & scheduling decisions (e.g., migrate computation at the beginning of a predicted peak) M.Gallet, N.Yigitbasi, B.Javadi, D.Kondo, A.Iosup, D.Epema, A Model for Space-correlated Failures in Large-scale Distributed Systems, Euro-Par
4 Our Goals GOAL 1 Investigate whether failures have time correlations GOAL 2 Model the time-varying behavior of failures (peaks) 4
5 Outline Background Our Approach Analysis of Time-Correlation Modeling the Peaks of Failures Conclusions 5
6 Why Not Root-Cause Analysis? Root-cause analysis is definitely useful Challenges Systems are large and complex Not all subsystems provide detailed info Little monitoring/debugging support Environment-specific or temporary failures Huge size of failure data 19 systems 6
7 Failure Trace Archive (FTA) Provides Availability traces of diverse distributed systems of different scale Standard format for failure events Tools for parsing & analysis Enables Comparing models/algorithms using identical data sets Evaluation of the generality/specificity of models/algorithms across different types of systems Analysis of availability evolution across time scales And many more The Failure Trace Archive 7
8 FTA Schema Hierarchical trace format Resource centric Event-based Associated metadata Codes for different components and events Available in raw, tabbed and MYSQL formats 8
9 Sample Trace Identifiers Type Event of for event: the start/stop event/component/node/platform Node unavailability/availability name time (UNIX time) 9
10 Outline Background Our Approach Analysis of Time-Correlation Modeling the Peaks of Failures Conclusions 10
11 Our Approach (1): Outline Traces Nineteen failure traces from the FTA Mostly production systems Analysis Use the auto-correlation of failure rate time series Modeling Fit well-known probability distributions to the failure data to model failure peaks 11
12 Our Approach (2): Traces 100K+ hosts ~1.2 M failure events 15+ years of operation in total 12
13 Our Approach (3): Analysis Auto-Correlation Function (ACF) Similarity between observations as a function of the time lag between them Mathematical tool for finding repeating patterns Used for assessing time correlations [-1 1]: weak strong correlation 13
14 Our Approach (4): Modeling We use five probability distributions to fit to the empirical data Exponential, Weibull, Pareto, Log-Normal, and Gamma Maximum likelihood estimation + Goodness of Fit Tests 14
15 Outline Background Our Approach Analysis of Time-Correlation Modeling the Peaks of Failures Conclusions 15
16 Analysis (1): Auto-correlation WEBSITES Many systems exhibit moderate/strong auto-correlation for moderate/short time lags (GRID5K, LDNS, SKYPE, ) 16
17 Analysis (2): Auto-correlation TERAGRID Small number of systems exhibit low autocorrelation (TeraGrid, PNNL, NOTRE-DAME) 17
18 Analysis (3): Failure Patterns Daily/Weekly Cycles Daily/Weekly Cycles MICROSOFT SKYPE Systems with similar usage patterns have similar failure patterns 18
19 Analysis (4): Workload Intensity vs Failure Rate GRID5000 There is a strong correlation between the workload intensity and the failure rate in some systems 19
20 Outline Background Our Approach Analysis of Time-Correlation Modeling the Peaks of Failures Conclusions 20
21 Failure Peaks (1): Model μ+kσ μ 21
22 Failure Peaks (2): Identification Our goal Balance between capturing the extreme system behavior and characterizing an important part of the system failures We use a threshold to isolate peaks μ + kσ where k is a positive integer Large k=> Few periods explaining only a small fraction of failures Small k=> More failures of probably very different characteristics We use k=1 Tried k={0.5, 0.9, 1.0, 1.1, 1.25, 1.5, 2.0} Over all traces, average fraction of downtime and average number of failures are close (see Technical Report) 22
23 Failure Peaks (3): Modeling Results (1) 1. On average, 50% - 95% of the system downtime is caused by the failures that originate during peaks, but the fraction of peaks < 10% for all platforms 2. The average peak durations are on the order of 1-2 hours 3. The average time between peaks is on the order of hours 4. Average IAT over the entire trace is about 9x the IAT during peaks 23
24 Failure Peaks (4): Modeling Results (2) 5. Exponential distribution is not a good fit for IAT during peaks, time between peaks, and failure duration during peaks Traditional models are not enough 6. Model parameters do not follow a heavy-tailed distribution Goodness of fit test results (p-values) for the Pareto distribution are very low 7. Weibull and the Log-Normal provide the best fit See the paper for the parameters 24
25 Conclusions (1) Large-Scale Study Nineteen traces most of which are production systems 100K+ hosts ~1.2 M failure events 15+ years of operation Four new traces available in the FTA (3 CONDOR + 1 TERAGRID) GOAL 1: Analysis Failures exhibit strong periodic behavior & time correlation Systems with similar usage patterns have similar failure patterns Strong correlation between workload intensity and failure rate 25
26 Conclusions (2) GOAL 2: Modeling Peak duration, time between peaks, the failure IAT during peaks, and the failure duration during peaks On average 50% - 95% of the system downtime is caused by the failures that originate during peaks (fraction of peaks < 10%) Weibull & the Log-Normal distributions provide good fit 26
27 Thank you! Questions? Comments? The Failure Trace Archive More Information: Guard-g Project: The Failure Trace Archive: PDS publication database: 27
Build-and-Test Workloads for Grid Middleware Problem, Analysis, and Applications
Build-and-Test Workloads for Grid Middleware Problem, Analysis, and Applications PDS Group, EEMCS, TU Delft Alexandru Iosup and Dick H.J. Epema CS Dept., U. Wisconsin-Madison Peter Couvares, Anatoly Karp,
More informationHeterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis Charles Reiss *, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz *, Michael A. Kozuch * UC Berkeley CMU Intel Labs http://www.istc-cc.cmu.edu/
More informationSpark, Hadoop, and Friends
Spark, Hadoop, and Friends (and the Zeppelin Notebook) Douglas Eadline Jan 4, 2017 NJIT Presenter Douglas Eadline deadline@basement-supercomputing.com @thedeadline HPC/Hadoop Consultant/Writer http://www.basement-supercomputing.com
More informationIntegrated Service Management
Integrated Service Management for Power servers As the world gets smarter, demands on the infrastructure will grow Smart traffic systems Smart Intelligent food oil field technologies systems Smart water
More informationOn Cloud Computational Models and the Heterogeneity Challenge
On Cloud Computational Models and the Heterogeneity Challenge Raouf Boutaba D. Cheriton School of Computer Science University of Waterloo WCU IT Convergence Engineering Division POSTECH FOME, December
More informationIntro to Big Data and Hadoop
Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties
More informationinteliscaler Workload and Resource Aware, Proactive Auto Scaler for PaaS Cloud
inteliscaler Workload and Resource Aware, Proactive Auto Scaler for PaaS Cloud Paper #10368 RS Shariffdeen, UKJU Bandara, DTSP Munasinghe, HS Bhathiya, and HMN Dilum Bandara Dept. of Computer Science &
More informationThe Importance of Complete Data Sets for Job Scheduling Simulations
The Importance of Complete Data Sets for Job Scheduling Simulations Dalibor Klusáček, Hana Rudová Faculty of Informatics, Masaryk University, Brno, Czech Republic {xklusac, hanka}@fi.muni.cz 15th Workshop
More informationFrameworks for massively parallel computing: Massively inefficient?
Frameworks for massively parallel computing: Massively inefficient? Bianca Schroeder * (joint with Nosayba El-Sayed) University of Toronto * Currently Visiting Scientist @Google Some background Main interest:
More informationCustomer Challenges SOLUTION BENEFITS
SOLUTION BRIEF Matilda Cloud Solutions simplify migration of your applications to a public or private cloud, then monitor and control the environment for ongoing IT operations. Our solution empowers businesses
More informationUncovering the Hidden Truth In Log Data with vcenter Insight
Uncovering the Hidden Truth In Log Data with vcenter Insight April 2014 VMware vforum Istanbul 2014 Serdar Arıcan 2014 VMware Inc. All rights reserved. VMware Strategy To help customers realize the promise
More informationAnalyzing Real Cluster Data for Formulating Allocation Algorithms in Cloud Platforms
Analyzing Real Cluster Data for Formulating Allocation Algorithms in Cloud Platforms Olivier Beaumont, Lionel Eyraud-Dubois, Juan-Angel Lorenzo-Del-Castillo To cite this version: Olivier Beaumont, Lionel
More informationCluster management at Google
Cluster management at Google LISA 2013 john wilkes (johnwilkes@google.com) Software Engineer, Google, Inc. We own and operate data centers around the world http://www.google.com/about/datacenters/inside/locations/
More informationExploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig
Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig Kashish Ara Shakil, Mansaf Alam(Member, IAENG) and Shuchi Sethi Abstract Cloud computing deals with heterogeneity and dynamicity
More informationvsom vsphere with Operations
vsom vsphere with Operations Maciej Kot Senior System Engineer VMware Inc. 2014 VMware Inc. All rights reserved. Agenda 1 Introduction 2 3 vcenter Operations Manager Overview vcenter Operations Manager
More informationLive Video Analytics at Scale with Approximation and Delay-Tolerance
Live Video Analytics at Scale with Approximation and Delay-Tolerance Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, Michael J. Freedman Video cameras are pervasive
More informationDigital Transformation of Energy Systems
DNV GL Energy Digital Transformation of Energy Systems A holistic approach to digitization of utility system operations through effective data management 1 SAFER, SMARTER, GREENER DNV GL: Global classification,
More informationWorkload Characteristics of a Multi-cluster Supercomputer
Workload Characteristics of a Multi-cluster Supercomputer Hui Li, David Groep 2, and Lex Wolters Leiden Institute of Advanced Computer Science (LIACS), Leiden University, the Netherlands. 2 National Institute
More informationCOMPARE VMWARE. Business Continuity and Security. vsphere with Operations Management Enterprise Plus. vsphere Enterprise Plus Edition
COMPARE VMWARE vsphere EDITIONS Business Continuity and Security vmotion Enables live migration of virtual machines with no disruption to users or loss of service, eliminating the need to schedule application
More informationIBM Tivoli Monitoring
Monitor and manage critical resources and metrics across disparate platforms from a single console IBM Tivoli Monitoring Highlights Proactively monitor critical components Help reduce total IT operational
More informationIBM Tivoli Workload Scheduler
Manage mission-critical enterprise applications with efficiency IBM Tivoli Workload Scheduler Highlights Drive workload performance according to your business objectives Help optimize productivity by automating
More informationINTER CA NOVEMBER 2018
INTER CA NOVEMBER 2018 Sub: ENTERPRISE INFORMATION SYSTEMS Topics Information systems & its components. Section 1 : Information system components, E- commerce, m-commerce & emerging technology Test Code
More informationProcess Optimization Training For Efficient Mobile Network Operations. Objectives Content Delivery
Process Optimization Training For Efficient Mobile Network Operations Objectives Content Delivery V3.0, March 2009, Slide 2 The Challenge: NGN Transition from network-centric operators to customer-centric
More informationPlan Your Work, Work Your Plan. Dr. R. Rockland Chair and Professor, Department of Engineering Technology New Jersey Institute of Technology
Plan Your Work, Work Your Plan Dr. R. Rockland Chair and Professor, Department of Engineering Technology New Jersey Institute of Technology Agenda Understand what a project is Understand the basics of
More informationGUIDE The Enterprise Buyer s Guide to Public Cloud Computing
GUIDE The Enterprise Buyer s Guide to Public Cloud Computing cloudcheckr.com Enterprise Buyer s Guide 1 When assessing enterprise compute options on Amazon and Azure, it pays dividends to research the
More informationResource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator
Resource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator Renyu Yang Supported by Collaborated 863 and 973 Program Resource Scheduling Problems 2 Challenges at Scale
More informationSpecial thanks to Chad Diaz II, Jason Montgomery & Micah Torres
Special thanks to Chad Diaz II, Jason Montgomery & Micah Torres Outline: What cloud computing is The history of cloud computing Cloud Services (Iaas, Paas, Saas) Cloud Computing Service Providers Technical
More informationSt Louis CMG Boris Zibitsker, PhD
ENTERPRISE PERFORMANCE ASSURANCE BASED ON BIG DATA ANALYTICS St Louis CMG Boris Zibitsker, PhD www.beznext.com bzibitsker@beznext.com Abstract Today s fast-paced businesses have to make business decisions
More informationGrid computing workloads
Grid computing workloads Iosup, A.; Epema, D.H.J. Published in: IEEE Internet Computing DOI: 1.119/MIC.21.13 Published: 1/1/211 Document Version Publisher s PDF, also known as Version of Record (includes
More information10/1/2013 BOINC. Volunteer Computing - Scheduling in BOINC 5 BOINC. Challenges of Volunteer Computing. BOINC Challenge: Resource availability
Volunteer Computing - Scheduling in BOINC BOINC The Berkley Open Infrastructure for Network Computing Ryan Stern stern@cs.colostate.edu Department of Computer Science Colorado State University A middleware
More informationA FRAMEWORK FOR CAPACITY ANALYSIS D E B B I E S H E E T Z P R I N C I P A L C O N S U L T A N T M B I S O L U T I O N S
A FRAMEWORK FOR CAPACITY ANALYSIS D E B B I E S H E E T Z P R I N C I P A L C O N S U L T A N T M B I S O L U T I O N S Presented at St. Louis CMG Regional Conference, 4 October 2016 (c) MBI Solutions
More informationActual4Test. Actual4test - actual test exam dumps-pass for IT exams
Actual4Test http://www.actual4test.com Actual4test - actual test exam dumps-pass for IT exams Exam : C2090-623 Title : IBM Cognos Analytics Administrator V1 Vendor : IBM Version : DEMO Get Latest & Valid
More informationBARCELONA. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
BARCELONA 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Optimizing Cost and Efficiency on AWS Inigo Soto Practice Manager, AWS Professional Services 2015, Amazon Web Services,
More informationOracle Communications Billing and Revenue Management Elastic Charging Engine Performance. Oracle VM Server for SPARC
Oracle Communications Billing and Revenue Management Elastic Charging Engine Performance Oracle VM Server for SPARC Table of Contents Introduction 1 About Oracle Communications Billing and Revenue Management
More informationCase Study BONUS CHAPTER 2
BONUS CHAPTER 2 Case Study ABC is a large accounting firm with customers in five countries across North America and Europe. Its North American headquarters is located in Miami, Florida, where it hosts
More informationUsing SAP with HP Virtualization and Partitioning
Using SAP with HP Virtualization and Partitioning Introduction... 2 Overview of Virtualization and Partitioning Technologies... 2 Physical Servers... 2 Hard Partitions npars... 3 Virtual Partitions vpars...
More informationIBM Tivoli OMEGAMON XE on z/vm and Linux
Manage and monitor z/vm and Linux performance IBM Tivoli OMEGAMON XE on z/vm and Linux Highlights Facilitate the cost-effective migration of workloads onto mainframes by monitoring z/vm and Linux performance
More informationLeveraging Renewable Energy in Data Centers
Leveraging Renewable Energy in Data Centers Ricardo Bianchini Department of Computer Science Collaborators: Inigo Goiri, Jordi Guitart (UPC/BSC), Md. Haque, William Katsak, Kien Le, Thu D. Nguyen, and
More informationMulti-Resource Packing for Cluster Schedulers. CS6453: Johan Björck
Multi-Resource Packing for Cluster Schedulers CS6453: Johan Björck The problem Tasks in modern cluster environments have a diverse set of resource requirements CPU, memory, disk, network... The problem
More informationEffective Straggler Mitigation
GZ06: Mobile and Cloud Computing Effective Straggler Mitigation Attack of the Clones Greg Lyras & Johann Mifsud Outline Introduction Related Work Goals Design Results Evaluation Summary What are Stragglers?
More informationOptimizing Grid-Based Workflow Execution
Journal of Grid Computing (2006) 3: 201 219 # Springer 2006 DOI: 10.1007/s10723-005-9011-7 Optimizing Grid-Based Workflow Execution Gurmeet Singh j, Carl Kesselman and Ewa Deelman Information Sciences
More informationSmart Monitoring System For Automatic Anomaly Detection and Problem Diagnosis. Xianping Qu March 2015
Smart Monitoring System For Automatic Anomaly Detection and Problem Diagnosis Xianping Qu quxianping@baidu.com March 2015 Who am I? Xianping Qu Senior Engineer, SRE team, Baidu quxianping@baidu.com Baidu
More informationProduction Loss Accounting with the PI System and RtDuet
Production Loss Accounting with the PI System and RtDuet Presented by Paul Yaroshak, Senior Process Systems Engineer Barrick Gold Corporation Pueblo Viejo Production Loss Accounting with the PI System
More informationHarvester. Tadashi Maeno (BNL)
Harvester Tadashi Maeno (BNL) Outline Motivation Design Workflows Plans 2 Motivation 1/2 PanDA currently relies on server-pilot paradigm PanDA server maintains state and manages workflows with various
More informationManageEngine Applications Manager in Financial Domain
ManageEngine Applications Manager in Financial Domain Abstract: A leading bank with thousands of branch offices deployed Applications Manager to monitor their back office applications used in different
More informationProduct and Program Updates
TIDAL WEBINAR: CATCH THE WAVE! Product and Program Updates 2018-11 Evolution of Product Strategy in 2018 Tactical Focus Reduce time to resolution of product issues Address backlog of customer requests
More informationResource Management for Rapid Application Turnaround on Enterprise Desktop Grids
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids Derrick Kondo, Andrew A. Chien, Henri Casanova Computer Science and Engineering Department San Diego Supercomputer Center
More informationIBM Emptoris Supplier Lifecycle Management on Cloud
Service Description IBM Emptoris Supplier Lifecycle Management on Cloud This Service Description describes the Cloud Service IBM provides to Client. Client means the contracting party and its authorized
More informationEnsure Your Servers Can Support All the Benefits of Virtualization and Private Cloud The State of Server Virtualization... 8
... 4 The State of Server Virtualization... 8 Virtualization Comfort Level SQL Server... 12 Case in Point SAP... 14 Virtualization The Server Platform Really Matters... 18 The New Family of Intel-based
More informationCloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise
Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service
More informationIBM Tivoli Workload Automation View, Control and Automate Composite Workloads
Tivoli Workload Automation View, Control and Automate Composite Workloads Mark A. Edwards Market Manager Tivoli Workload Automation Corporation Tivoli Workload Automation is used by customers to deliver
More information10 Ways Oracle Cloud Is Better Than AWS
10 Ways Oracle Cloud Is Better Than AWS BY UMAIR MANSOOB Who Am I Oracle Certified Administrator from Oracle 7 12c Exadata Certified Implementation Specialist since 2011 Oracle Database Performance Tuning
More informationDatametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud
Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise
More informationIBM storage solutions: Evolving to an on demand operating environment
May 2003 IBM TotalStorage IBM storage solutions: Evolving to an on demand operating environment Page No.1 Contents 1 e-business on demand 1 Integrated information fuels on demand businesses 2 Integrated
More informationRelease 12.2 Beta Program
Release 12.2 Beta Program By Gustavo Gonzalez Taking Our Own Medicine Used the E-Business Suite since 2004 Upgraded to R12 in January 2009 Implemented OBIEE in January 2010 R12.2 Beta Program in January
More informationThe Evolution of Analytics
The Evolution of Analytics Ed Colet Capital One Financial Corporation SAS Global Forum, Executive Track Presentation April, 2011 Outline Looking back at the evolution of analytics Standard views, and the
More informationDefining and Measuring Red Storm Reliability, Availability, and Serviceability (RAS)
Defining and Measuring Red Storm Reliability, Availability, and Serviceability (RAS) Jon Stearley Sandia National Laboratories May 18, 2005 Cray Users Group 2005 Conference Outline Problem: Can t agree
More informationVirtualWisdom Analytics Overview
DATASHEET VirtualWisdom Analytics Overview Today s operations are faced with an increasing dynamic hybrid infrastructure of near infinite scale, new apps appear and disappear on a daily basis, making the
More informationBut at the same time, I need to have proof of it. As I provide proof of what I wrote with numbers and tests (all available on github).
Recently Fred published a post ( http://lefred.be/content/mysql-group-replication-is-sweet-but-c an-be-sour-if-you-misunderstand-it) in which he was stating, I had publish my blog ( http ://www.tusacentral.net/joomla/index.php/mysql-blogs/191-group-replication-sweet-a-sour.html)
More informationGrid Resource Availability Prediction-Based Scheduling and Task Replication
J Grid Computing (29) manuscript No. (will be inserted by the editor) Grid Resource Availability Prediction-Based Scheduling and Task Replication Brent Rood Michael J. Lewis Received: date / Accepted:
More informationDIET: New Developments and Recent Results
A. Amar 1, R. Bolze 1, A. Bouteiller 1, A. Chis 1, Y. Caniou 1, E. Caron 1, P.K. Chouhan 1, G.L. Mahec 2, H. Dail 1, B. Depardon 1, F. Desprez 1, J. S. Gay 1, A. Su 1 LIP Laboratory (UMR CNRS, ENS Lyon,
More informationThe concepts described herein apply to all versions of IBM Cognos 8 BI and IBM Cognos 10 BI.
Introduction Purpose This document is meant to supplement the Security and Administration Guide and Architecture and Deployment Guide which are part of the IBM Cognos BI product documentation. It will
More informationServices Guide April The following is a description of the services offered by PriorIT Consulting, LLC.
SERVICES OFFERED The following is a description of the services offered by PriorIT Consulting, LLC. Service Descriptions: Strategic Planning: An enterprise GIS implementation involves a considerable amount
More informationTaking Advantage of Cloud Elasticity and Flexibility
Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1 Public cloud adoption is surging 2 Cloudera customers are leading the way 3 Hadoop was born for the
More informationLeveraging smart meter data for electric utilities:
Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive 5/16/2017 Hitachi, Ltd. OSS Solution Center Yusuke Furuyama Shogo Kinoshita Who are we? Yusuke Furuyama Solutions engineer
More informationINFOBrief. EMC VisualSRM Storage Resource Management Suite. Key Points
INFOBrief EMC VisualSRM Storage Resource Management Suite Key Points EMC VisualSRM is data center-class software specifically architected to provide centralized storage resource management for mid-tier
More informationMegaRAC XMS Client Management Suite
MegaRAC XMS Client Management Suite For Easy and Effective Management Joseprabu Inbaraj MegaRAC XMS is a centralized management server that is architected with extendibility in mind. Client Management
More informationLeveraging smart meter data for electric utilities:
Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive 5/16/2017 Hitachi, Ltd. OSS Solution Center Yusuke Furuyama Shogo Kinoshita Who are we? Yusuke Furuyama Solutions engineer
More information5th Annual. Cloudera, Inc. All rights reserved.
5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software
More informationWhatever Happened to Rosey Jetson? How Banks/ Retailers/ Processors/ Networks Could Use Artificial Intelligence in Day-to- Day Operations
Whatever Happened to Rosey Jetson? How Banks/ Retailers/ Processors/ Networks Could Use Artificial Intelligence in Day-to- Day Operations Kevin Johnson Christopher Souser Thursday March 1 st, 2018 4:30
More informationAll Events. One Platform.
All Events. One Platform. Industry s first IT ops platform that truly correlates the metric, flow and log events and turns them into actionable insights 2 Motadata brought a refreshing experience against
More informationFebruary 14, 2006 GSA-WG at GGF16 Athens, Greece. Ignacio Martín Llorente GridWay Project
February 14, 2006 GSA-WG at GGF16 Athens, Greece GridWay Scheduling Architecture GridWay Project www.gridway.org Distributed Systems Architecture Group Departamento de Arquitectura de Computadores y Automática
More informationIBM High Performance Services for Hadoop
IBM Terms of Use SaaS Specific Offering Terms IBM High Performance Services for Hadoop The Terms of Use ( ToU ) is composed of this IBM Terms of Use - SaaS Specific Offering Terms ( SaaS Specific Offering
More informationGandiva: Introspective Cluster Scheduling for Deep Learning
Gandiva: Introspective Cluster Scheduling for Deep Learning Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu
More informationDigital Transformation of Energy Systems
DNV GL Energy Digital Transformation of Energy Systems A holistic approach to digitization of utility system operations through effective data management 1 SAFER, SMARTER, GREENER DNV GL: Global classification,
More informationIBM Virtualization Manager Xen Summit, April 2007
IBM Virtualization Manager Xen Summit, April 2007 Senthil Bakthavachalam 2006 IBM Corporation The Promise of Virtualization System Administrator Easily deploy new applications and adjust priorities Easily
More informationBuilding a Real-Time Event-Driven Enterprise Infrastructure. Ann Moore Business Development Executive
Building a Real-Time Event-Driven Enterprise Infrastructure Ann Moore Business Development Executive Agenda PI for Enterprise Infrastructure Utility Industry Use Cases Operational Data Non-Operational
More informationIn Cloud, Can Scientific Communities Benefit from the Economies of Scale?
PRELIMINARY VERSION IS PUBLISHED ON SC-MTAGS 09 WITH THE TITLE OF IN CLOUD, DO MTC OR HTC SERVICE PROVIDERS BENEFIT FROM THE ECONOMIES OF SCALE? 1 In Cloud, Can Scientific Communities Benefit from the
More informationAutomated Service Builder
1 Overview ASB is a platform and application agnostic solution for implementing complex processing chains over globally distributed processing and data ASB provides a low coding solution to develop a data
More informationGRID RESOURCE AVAILABILITY PREDICTION-BASED SCHEDULING AND TASK REPLICATION
GRID RESOURCE AVAILABILITY PREDICTION-BASED SCHEDULING AND TASK REPLICATION BY BRENT ROOD BS, State University of New York at Binghamton, 25 MS, State University of New York at Binghamton, 27 DISSERTATION
More informationAccelerating Billing Infrastructure Deployment While Reducing Risk and Cost
An Oracle White Paper April 2013 Accelerating Billing Infrastructure Deployment While Reducing Risk and Cost Disclaimer The following is intended to outline our general product direction. It is intended
More informationMeeting the New Standard for AWS Managed Services
AWS Managed Services White Paper April 2017 www.sciencelogic.com info@sciencelogic.com Phone: +1.703.354.1010 Fax: +1.571.336.8000 Table of Contents Introduction...3 New Requirements in Version 3.1...3
More informationComputing efforts supporting Physics Analyses
Computing efforts supporting Physics Analyses reminder on the main aim of the Analysis Support Task Force survey of existing tools available to monitor/diagnose/communicate survey of how we use our GRID-like
More informationTesting SLURM batch system for a grid farm: functionalities, scalability, performance and how it works with Cream-CE
Testing SLURM batch system for a grid farm: functionalities, scalability, performance and how it works with Cream-CE DONVITO GIACINTO (INFN) ZANGRANDO, LUIGI (INFN) SGARAVATTO, MASSIMO (INFN) REBATTO,
More informationORACLE INFRASTRUCTURE AS A SERVICE PRIVATE CLOUD WITH CAPACITY ON DEMAND
ORACLE INFRASTRUCTURE AS A SERVICE PRIVATE CLOUD WITH CAPACITY ON DEMAND FEATURES AND FACTS FEATURES Hardware and hardware support for a monthly fee Optionally acquire Exadata Storage Server Software and
More informationThe Sumo Logic Solution: Application Management
The Sumo Logic Solution: Application Management Introduction The most critical and demanding responsibility facing CIOs, IT operations managers and system administrators on a daily basis is to keep their
More informationHA/DR Presentation. MRMUG meeting 6/5/2013
HA/DR Presentation MRMUG meeting 6/5/2013 Disaster Recovery as part of IT Business Continuity - Where it fits Business Continuity High Availability Fault-tolerant, failure-resistant infrastructure supporting
More informationTop 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11
Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer
More informationWHITE PAPER. CA Nimsoft APIs. keys to effective service management. agility made possible
WHITE PAPER CA Nimsoft APIs keys to effective service management agility made possible table of contents Introduction 3 CA Nimsoft operational APIs 4 Data collection APIs and integration points Message
More informationManaging Microservices using the All-in-One TIBCO Monitor RTView Enterprise Monitor
Managing Microservices using the All-in-One TIBCO Monitor RTView Enterprise Monitor Kalpana Kulanthaivelu, Wells Fargo Rodney Morrison, SL Wednesday, May 18 th, 2016 Microservices Are Picking Up Steam
More informationIaaS Cloud Benchmarking:
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands Our team: Undergrad Nassos Antoniou,
More informationBuilding a Tableau Center of Excellence
# T C 1 8 Building a Tableau Center of Excellence Michael Cox Principal Architect Tableau Michael Cox Principal Architect Tableau Tableau Center of Excellence (COE) COE Overview COE Competencies ( Events
More informationScheduling and Resource Management in Grids
Scheduling and Resource Management in Grids ASCI Course A14: Advanced Grid Programming Models Ozan Sonmez and Dick Epema may 13, 2009 1 Outline Resource Management Introduction A framework for resource
More informationMachine Learning Based Prescriptive Analytics for Data Center Networks Hariharan Krishnaswamy DELL
Machine Learning Based Prescriptive Analytics for Data Center Networks Hariharan Krishnaswamy DELL Modern Data Center Characteristics Growth in scale and complexity Addition and removal of system components
More information1 Entire contents 2007 Forrester Research, Inc. All rights reserved.
1 Entire contents 2007 Forrester Research, Inc. All rights reserved. ROI of Oracle Database Management Packs Noel Yuhanna Principal Analyst Forrester Research Theme All enterprises should focus on database
More informationIntroduction to glite Middleware
Introduction to glite Middleware Malik Ehsanullah (ehsan@barc.gov.in) BARC Mumbai 1 Introduction The Grid relies on advanced software, called middleware, which interfaces between resources and the applications
More informationInnovation Without Limits. Your Guide to High Performance Computing in the Cloud
Innovation Without Limits Your Guide to High Performance Computing in the Cloud 4 5 6 7 8 10 12 What Could You Accomplish with a Million Cores? Access Resources Quickly Leverage Latest Technology Collaborate
More informationOne System for Grid Operations Management
One System for Grid Operations Management Spectrum Power ADMS Siemens AG 2014 All Right Reserved. usa.siemens.com/smartgrid Evolving grid challenges Increasing grid complexity including integration of
More informationDisaster Recovery Service Guide
Disaster Recovery Service Guide Getting Started Overview of the HOSTING Unified Cloud The HOSTING Unified Cloud is our approach for helping you achieve better business outcomes. It combines the industry's
More informationA Matter ATLANTIS ERP ATLANTIS ERP ATLANTIS ERP s ATLANTIS ERP
A Matter of Strategy In today s demanding and fast changing business environment, the installation of an IT system constitutes a matter of strategy. The upgrade to a state-of-the-art system is based on
More information