Reliability From A Service Provider s Perspective (Mark Adams, Senior Director: Reliability, Quality and Network Assets) May, 2009

Similar documents
Reliability Module. By: Alex Miller and Mark Robinson. Material Summarized from Reliability Module

RAM & LCC for railways Industry: What s really necessary to high performance achievement?

AMERICAN SOCIETY FOR QUALITY CERTIFIED RELIABILITY ENGINEER (CRE) BODY OF KNOWLEDGE

CERTIFIED RELIABILITY ENGINEER (CRE) BODY OF KNOWLEDGE MAP 2018

Evaluating Your Electrical Distribution System

Fiducia TG Reliability Life Cycle Management (RLCM)

Introduction to RAM. What is RAM? Why choose RAM Analysis?

PUMP STATION CONDITION ASSESSMENT PROGRAMS: CASE STUDIES IN NORTH AND SOUTH CAROLINA

System Reliability Theory: Models and Statistical Method> Marvin Rausand,Arnljot Hoylanc Cowriaht bv John Wilev & Sons. Inc.

Maintenance Management for Reliability Training Course Day 1 Content. Disclaimer slides do not replace education, training, skills and experience

DFR ROI: Calculating ROI When Implementing a DFR Program

T25 - Leverage the Digital Enterprise to Maximize Asset Performance

IT Strategic Alignment Benchmark

OVERVIEW. AVL Reliability Engineering & Load Matrix

AN OVERVIEW OF RELIABILITY

AMI information. for improved outage management

Outline. Concept of Development Stage. Engineering Development Stage. Post Development Stage. Systems Engineering Decision Tool

Contents An Introductory Overview of ITIL Service Lifecycle: concept and overview...3 I. Service strategy...6 The 4 P's of ITIL Service

RELIABILITY, AVAILABILITY AND MAINTAINABILITY CONCEPTS

JOB DESCRIPTION. Head of Maintenance. Estates and Facilities Division. GRADE: Grade 8

FMEA Failure Mode Effects Analysis. ASQ/APICS Joint Meeting May 10, 2017

Frameworx 11 Certification Report Business Process Framework Release 9.0

Risk Management Tools and Techniques

QuEST Forum. TL 9000 Quality Management System. Requirements Handbook

System Reliability and Challenges in Electronics Industry. SMTA Chapter Meeting 25 th September 2013, India

<Full Name> Quality Manual. Conforms to ISO 9001:2015. Revision Date Record of Changes Approved By

Characteristics of a Robust Process

How to digitally transform your manufacturing operation. Manufacturing sector whitepaper

Ensuring Near-Perfect On-Time Performance

Apprenticeship Standard for High Speed Rail and Infrastructure (HSRI) Advanced Technician

Spokane River Assessment

COPYRIGHTED MATERIAL RELIABILITY ENGINEERING AND PRODUCT LIFE CYCLE 1.1 RELIABILITY ENGINEERING

RELIABILITY, AVAILABILITY, MAINTAINABILITY (RAM)

Accelerate and assure wireless services with intelligent solutions for wireless network and service management.

Savings Show Success of IT Service Management Initiative

Pass4sure.ITIL-F.347.QA

ENG-301 Reliability and Maintainability Engineering Overview

EXIN.ITIL _by.getitcert.com Mine Modified

SUPPLIER PERFORMANCE PROGRAM

Big Data & Analytics for Wind O&M: Opportunities, Trends and Challenges in the Industrial Internet

CHANGE HISTORY. Version Change Details Author Date. Track the sections that you change and give a summary of any key updates made.

Managed IT Services. Eliminating technology pains for small businesses

StableNet Enterprise. Automated IT Management & Business Service Assurance

RAM Development for Gasification and IGCC Plants

Information Technology Engineers Examination. Information Technology Service Manager Examination. (Level 4) Syllabus

Activity 1 Failure Mode and Effects Analysis (FMEA)

Risk Management in the 21 st Century Ameren Business Risk Management

Reliability Engineering & Asset Management (REAM) IMechE Accredited CPD Courses

mywbut.com Software Reliability and Quality Management

SESSION 207 Wednesday, November 1, 11:30am - 12:30pm Track: People, Culture, and Value

Contents of the Failure Mode Effects Analysis the Plant Wellness Way Distance Education Course FMEA Training Online

StableNet Telco. Automated Service Assurance and Fulfillment

White Paper. Managed IT Services as a Business Solution

Managed IT Services. Eliminating technology pains for small businesses

Testing. Testing is the most important component of software development that must be performed throughout the life cycle

Supportability Management Best Practices in accordance with SD-22 DMSMS Management Guidelines

ROSAS Seminar RAMS in Railways. Wolfgang Berns 17 May 2017

GR-418-CORE Reliability Assurance for Fiber Optic Systems

Managed IT Services. Eliminating technology pains for small businesses

ASQ Reliability Division. Who we are & How we can help

NetBoss Technologies Integrated Service Assurance

Service Description for IP Implementation. Issue 1.0. Date

Optimizing Service Assurance with Vitria Operational Intelligence

Information Technology. Classification Band 5. Position Objective

Managed IT Services. Eliminating technology pains for small businesses

Top 35 Reasons You Need Contact Center Performance Management

The Worry-Free IT Investment

White Paper Remove the risk from your multi-vendor infrastructure

IT applications in Power Distribution - UPCL's agenda for reforms

Chapter 5: product Design and Quality Function Deployment

Syllabus: Reliability, Availability, Maintenance Strategies

EXIN ITIL Exam Questions & Answers

FTA SMS Update. APTA Board Members Conference. July 21, 2015

ITIL Qualification: MANAGING ACROSS THE LIFECYCLE (MALC) CERTIFICATE. Sample Paper 2, version 5.1. To be used with Case Study 1 QUESTION BOOKLET

Thoughts on Possible Reliability Work for TESLA

Financial Close Best Practices

Protecting Information Assets - Unit #9 - Business Continuity and Disaster Recovery Planning. MIS 5206 Protecting Information Assets

Determining a defensible preventive maintenance plan

NOSSIS Operations Support Systems (OSS) A suite of fully integrated OSS products

Lecture Outline. Copyright 2011 John Wiley & Sons, Inc. 4-2

Functional Safety: ISO26262

Airspace Systems Management in Transition

ITILF.exam.251q ITILF ITIL Foundation (syllabus 2011)

ITIL Foundation V3. Walaa Omar

Managed IT Services. Eliminating technology pains for small businesses

Managed IT Services Eliminating technology pains in small businesses

Managed IT Services. Eliminating technology pains for small businesses

Achieve greater efficiency in asset management by managing all your asset types on a single platform.

APM Reliability Classic from GE Digital Part of our On-Premise Asset Performance Management Classic Solution Suite

Strategic Plan for World Class. Reliability. Kate Kerrigan. Operations Director, GPAllied

DRIVING EFFICIENCIES: 6 STEPS TO IMPROVING ASSET PERFORMANCE IN MANUFACTURING

USING TELEMETRY TO MEASURE EQUIPMENT MISSION LIFE ON THE NASA ORION SPACECRAFT FOR INCREASING ASTRONAUT SAFETY

CRE Sample Test #2. e - μ μ k k = 0, 1, 2, 3,... P( x = k) = 1. Which answer is the expected value of the Poisson distribution shown below?

Optimao. In control since Control Systems Obsolescence: Support Strategies and Key Considerations

Software reliability poses a significant challenge for the Army as software is increasingly. Scorecard Reviews. for Improved Software Reliability

A technical discussion of performance and availability December IBM Tivoli Monitoring solutions for performance and availability

Enterprise SM VOLUME 2, SECTION 2.6: TROUBLE AND COMPLAINT HANDLING

Efficiency First Program

Chapter 4 Project Management

IT Optimization Award

Transcription:

Reliability From A Service Provider s Perspective (Mark Adams, Senior Director: Reliability, Quality and Network Assets) May, 2009

Observations from a Service Provider For many equipment suppliers, life has been tough in telecom these past years! Equipment suppliers have faced many challenges (e.g.): The infamous telecom bubble burst (starting 1997) Mergers, acquisitions and bankruptcies Legacy bread and butter products becoming obsolete & newer products not keeping up Recession of 2008 Many small & new international competitors are emerging in the U.S Equipment supplier stability and performance is a concern: Some may not survive? What about our base? What is the impact of massive program, resource & cost cutting? Will it impact their ability to support us? What is the impact to proactive performance improvement programs? How will newer players perform over time? 2

Level Set: Service Provider Networks Customers expect high performance from us & have choices We spend a lot of time, energy and money to build and operate our networks to meet customer needs (e.g. availability): Alternate power and redundant equipment to handle faults Diversity: Re-routing of traffic to handle circuit faults (e.g. ring architecture & IP routing) Extensive network management applications and large base of skilled operations resources We spend more money than we should to build and support our networks when supplier equipment reliability is not optimized: Both equipment suppliers and service providers could benefit by use of reliability engineering principles to reduce reliability failures & life cycle costs 3

Terminology - What is Reliability? Reliability is the probability that an item can perform its intended function for a specified interval under -t stated conditions: R MTBF () t = e In a word, Reliability is Dependability. Reliability Engineering and Design for Reliability is the discipline focused on making it happen. Example Faults Infant Mortality Useful Life Wear-out Poor Design Poor Inspection and Quality Control Unanticipated Variation in Environment Unanticipated Variation in Application Degradation Fatigue Poor Manufacturing Process Unanticipated Variation in Materials Corrosion 4

Terminology - What is Availability? Availability is the proportion of time a system is in its operational state to perform its intended function. Availability is affected by two key factors: The robustness of the design (mean time between failure - MTBF) The speed the repair crew can bring the product back online to provide the service (mean time to restore - MTTR) Availability (%) MTBF = x100 MTBF+ MTBR Let s define service availability as performance from the end user perspective Is the cable modem ready when the customer wants to connect to the Internet? 5

Some Basic Design Reliability Math Its easy to do the math, if you have the right data for your population which is simply equipment installation and equipment failure date. Reliability function R(t) Probability an item survives up to some time, t1: Equals one minus the unreliability In general, at time t = 0, R(t) = 1 and as time t approaches infinity, R(t) approaches 0. R( t) = 1 F( t) = 1 f( t) dt= f( t) dt t 0 t Probability Density Function 1 Reliability Function Hint: Pay attention to this? f(t) Reliability, R(t) R(t) t 1 t Key Finding: Many suppliers currently do not do this math or perform it inadequately or incorrectly. 0 0 t 1 t 6

A Word About Reliability Reliability is becoming more and more important to service providers: Lower Reliability can impact: Customer satisfaction Revenue Life cycle costs Equipment and maintenance cost to maintain service Bottom Line Impact $$$ The balance in the market is shifting In some segments, revenue growth is slowing reliability can help maintain profitability Competition is fierce: Customers have more choices Customers demand service performance Systems are becoming more complex Service provider Support costs are increasing 7

Example: Why is Reliability Important? Most service providers build networks to maintain high service availability Therefore, reliability failures frequently do not result in customer impact Reliability failures do increase cost and operational effort!! Key: Maintain high availability but focus suppliers on DFR!! Reliability Cost of operation =$$ s Close the gap & reduce cost & improve operational efficiency 0 1 2 3 4 5 Time (Years) 8

What Should Providers Expect From Suppliers? Demonstrate a formal senior leadership sponsored reliability policy and comprehensive program. Reliability practices must begin early in the design process: They must be well integrated into the product development cycle Use science based methods to design reliability into products/processes Understand when, what and where to use the wide variety of reliability engineering tools available Reliability tracking and improvement must continue after the product is released and through to retirement: Implement a failure reporting, analysis and corrective action system (FRACAS). Focus on continuous improvement Implement an audit & compliance system to ensure goals are met Feed results to your development process for continuous improvement Expect Suppliers to agree to contractual R(t) targets 9

Steps to Reliability in Design Observation: Many suppliers do not have adequate programs in-place! 1. State goals 2. Define the product/functional operation with functional block diagrams 3. Use reliability block diagrams to describe the relationships of the various system elements (e.g. series, parallel, etc.). 4. Develop a reliability model of the system: Collect/test part and subsystem reliability data (e.g. Halt) Adjust data to fit the special operating conditions of your system Predict reliability using a mathematical model Assign reliability allocations to system and sub-systems 5. Verify, assess risk & improve designs using reliability tools (e.g.): Fault Tree Analysis (FTA) and/or Failure Mode & Effects analysis (FMEA) Stress factors and parameter diagrams (P-Diagrams) Field data (if available) Complete reliability growth and design changes to achieve/exceed goals 10

How Can Service Providers Drive Reliability? 1. Implement an end to end supplier performance program 2. Use formal design for reliability in your new product development process 3. Implement an operational performance tracking system 4. Analyze information to identify and prioritize opportunities 5. Implement corrective and preventive action programs 6. Validate results and ensure accountability 11

Step 1: Supplier Performance Program 1) Select reliable suppliers and associated products: Insert reliability criteria into the vendor/product selection process as a factor Validate reliability claims as stated using DFR and process 2) Formalize supplier performance agreements: Contractual agreement on key performance attributes (SLA) $ Liquidated damages apply 3) Implement a performance program for your suppliers: Regularly measure all elements of the relationship via a formal scorecard Hold regular Vendor/Cox reviews Implement action plans for areas of non-compliance Governance and validation Weight Q3 Performance Measures Focus Area Curr. Quarter Status Weighted Score 10% Purchasing 40 4 20% Software Quality 70 14 10% Hardware Quality 80 8 30% System Availablility 100 30 10% Technical Support 90 9 5% Outage Restoration 90 5 15% Problem Report Fix Response Time 60 9 OVERALL Quarterly Score: 79 12

Step 2: New Product/Service Process Implement formal design for reliability tasks as part of your formal new product/service/integration development process: 13

Step 3: Measurement Systems Implement comprehensive measurement systems that tracks all areas of performance: Supplier performance scorecards with a service level agreement compliance module @ supplier/product level: Hardware return rates and true reliability measures Software performance attributes Service and support (time to answer, restore and resolve) A true reliability database to track devices (e.g.): MTBF Cumulative survival and failure rate functions Reliability centered maintenance attributes Internal focused scorecards to track performance (e.g.): Trouble call rates and associated measures (e.g. repeats) Availability Time to restore External scorecards published to key customers 1.000 0.800 0.600 0.400 0.200 Reliability vs Time Plot 0.000 0.000 1 2 Time, (t) 3 4 5 14

Step 4-5-6: Analysis, Improvement & Compliance Analyze data and prioritize opportunities Implement improvement projects to address opportunities Implement a compliance tracking system: Reviews Audits Accountability (e.g. vendor penalties, internal follow-up, etc.) Feedback mechanism to appropriate functions All others 51% Example: 7% of these nodes are causing 50% of the Alarms why? NE4887 3% NE4822 10% NE4710 7% NE4878 5% NE4713 4% NE4726 3% NE4715 5% NE4705 4% NE4720 4% NE4708 4% Other components 3% Power Supply 3% Reverse Amp 3% NPF & OOA/OOS 28% Table of Survival Probabilities Modulators 95.0% Normal CI Time Probability Lower Upper 8760 0.932408 0.917617 0.944624 Some Content May be the Opinion of the Author and Not Necessarily Cox Communications 43800 0.378523 0.340865 0.416074 All Rights Reserved GaAs 9% Example: One hardware module is causing majority - why? Diplex filters 1% Others 5% Capacitors 48% Example: Capacitor Overstress? 15

In Summary - How to Ensure Reliability A Reliability Program: A Systematic Methodology : 1) Select Reliable Vendors Vendor Performance Use reliability as part of vendor/product selection criteria and validate claims Establish contractual reliability targets/penalties Compliance and Control Validation 6) Validate performance and cost action plans for effectiveness Feedback results to vendor performance, development and operational performance groups Assess Vendor penalties and next steps feedback to Supply Chain < > Standardized & Disciplined Product Release Process 2) Design for Reliability Operations Readiness Allocate reliability goals for the solution Test reliability with formal DFR methods as part of the new product introduction process Improvement Plans Vendor Actions Internal/Ext. Actions Reliability Actions 5) > 3) Use reliability & quality methods to drive continuous improvement (Six Sigma) < Measurement Systems Vendor Performance Internal/Ext. (e.g. NRSC) Reliability Tracking Vendor scorecards with SLA compliance Standard internal and external network/service performance scorecard Need to track true reliability and cost performance (reliability Db module) Analysis Opportunities 4) Use Reliability engineering and Quality analysis tools to identify issues Prioritize issues and plan of attack 16