Designing Feedback Control Systems for Service Delivery Management Yiin Diao December 5, 2011 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Overview IT Service Management Workload Prioritization Staffing Optimization 2 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
What is IT Service Management IT service management (ITSM) is a discipline for managing information technology (IT) systems from the perspective of the business and customers ITSM stands in deliberate contrast to technology-centered approaches to IT management and business interaction. ITSM is process-focused and includes management processes and technologies that enable service providers to manage the IT systems ITSM is generally concerned with the operational concerns of information technology management, and not with technology development and technical details The primary objective of ITSM is to ensure that the IT services are aligned to the business needs and actively support them. It is also increasingly important that IT acts as an agent for change to facilitate business transformation 3 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
An Eample of IT Service and IT Service Management Customers Sales People IT Infrastructure, Applications, Network Components Customer Outcome: Sales people spending more time interacting with customers IT Service: a remote access service that enables reliable access to corporate sales systems from sales people s computers A service is a means of delivering value to customers by facilitating outcomes customers want to achieve without the ownership of specific costs and risks IT Service Management provides organizational capabilities for providing value to customers in the form of services Processes, methods, functions, roles and activities that a service provider uses to facilitate the customer outcome To understand the need, to delivery the value, to reduce the cost, and to manage the lifecycle 4 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden IT Service Management
ITIL Service Lifecycle (Information Technology Infrastructure Library) Financial Management Service Portfolio Management Demand Management Service Catalogue Management Service Level Management Capacity Management Availability Management IT Service Continuity Management Information Security Management Supplier Management Event Management Incident Management Problem Management Request Fulfillment Access Management 5 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden Change Management Service Asset Management Knowledge Management Transition Planning and Support Release and Deployment Management Service Validation and Testing Evaluation
An Eample of Service Delivery Systems Workload Prioritization Quality Team Customers Ticketing Systems Account Teams Problem/Change Tickets Work Orders Service Requests Primary Dispatchers Project Requests Project Dispatchers Service Orders Dispatching Management Systems Ticket Assignment Server Systems Server Data System Administrators Staffing Optimization 6 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
A Process View of Service Delivery (1) Creation (2) Classification (3) Dispatching (5) Reporting (4) Resolution 7 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Workload Prioritization Closed-loop performance management: fill the gap and bring up the synergy between IPC dispatching and performance Incident, Problem & Change Dispatching Dispatching decisions are based on skill requirements, ticket severity, SLA target time, etc., but not on meeting SLA attainment levels Performance Management Measurements focus on productivity, mean time to resolve, and SLA attainment levels 8 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Simulation Setup Average Ticket Volume per Week by Severity Service Level Targets by Severity Sev1 Sev2 Sev3 Sev4 Total C1 0.94 29.69 259.11 221.41 511.15 C2 6.61 5.24 115.33 163.96 291.23 C3 0.27 1.23 37.34 18.77 57.61 Sev1 Sev2 Sev3 Sev4 C1 95% in 4C 90% in 8C 80% in 12C 80% in 24C C2 99% in 2C 99% in 4C 95% in 8C 90% in 16C C3 95% in 4C 90% in 8C 85% in 24C 85% in 48C Pick a simple pool with a small number of customers for illustrative purpose Service level targets are similar but not identical across accounts 9 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Open Loop Performance Management - Prioritization based on static SLA objectives Method 1: Severity Based Dispatching Sev1 Sev2 Sev3 Sev4 Service Level Targets by Severity C1 C2 C3 95% in 4C 99% in 2C 95% in 4C 90% in 8C 99% in 4C 90% in 8C 80% in 12C 95% in 8C 85% in 24C 80% in 24C 90% in 16C 85% in 48C Sev1 Sev2 Sev3 Sev4 Sev1 Sev2 Sev3 Sev4 C1 C2 C3 96% 78% 98% 100% 95% 100% 100% 99% 100% 91% 74% 96% Method 2: Earliest Deadline First C1 C2 C3 94% 86% 100% 99% 96% 98% 99% 98% 98% 84% 96% 75% Prioritization based on the severity level Within the same severity level, the account with the shortest target time tends to miss the SLA Prioritization based on the target time Within the same severity level, the account with the longest target time tends to miss the SLA Without acting upon the SLA attainment level during dispatching, the SLA targets may be missed at the end of the month This may occur more often when the pool is closed to the optimal staffing level, and thus the resource is constrained 10 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Closed Loop Performance Management - Prioritization based on dynamic SLA attainment feedback Service Level Targets by Severity Closed Loop Performance Management C1 C2 C3 C1 C2 C3 Sev1 95% in 4C 99% in 2C 95% in 4C Sev1 97% 77% 99% Sev2 90% in 8C 99% in 4C 90% in 8C Sev2 100% 96% 100% Sev3 80% in 12C 95% in 8C 85% in 24C Sev3 99% 100% 100% Sev4 80% in 24C 90% in 16C 85% in 48C Sev4 82% 93% 89% SLA Attainment (C1) SLA Attainment (C2) SLA Attainment (C3) 100% 90% 80% 70% 60% 10% 0% Control Error (C1) 100% 90% 80% 70% 60% 10% 0% Control Error (C2) 100% 90% 80% 70% 60% 10% 0% Control Error (C3) Priority is determined dynamically during dispatching Prioritization based on who may miss the SLA attainment target the most -10% -20% Priority (C1) -10% -20% Priority (C2) -10% -20% Priority (C3) All SLAs can be met (given right staffing) with similar safety margins 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 11 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Remarks on CPM Controller Design: Simple to design and implement No need for sophisticated models No significant (historical) data requirements Performance: Effective to meet the SLAs Suitable to multiple SLC pool types Robust to request arrival variations 12 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Modeling Service Delivery Systems Quality Team Customers Native IPC Systems Account Teams Problem/Change Tickets Work Orders Service Requests Primary Dispatchers Project Requests Project Dispatchers Service Orders Dispatching Management Systems Ticket Assignment Server Systems Server Data System Administrators 13 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Optimization Model: Overview s. t. min f J ( t v, s, y,, w, m ) j= 1 k= 1 ip k j jk K J j= 1 k= 1 ipc J j= 1 K k= 1 a 0 kr K ipc jk jk c jk k jl = jk k j b r jk ip Object Function: minimize the labor cost ip α ip Service Level Constraint: meet the SLA requirements given the system dynamics Staffing Coverage Constraints: meet the local regulatory requirements on staffing Decision Variable: # agents per shift per skill level 14 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Simulation-optimization Approach to Determine Optimal Solutions Shift 1: (15L, 6M, 4H) Shift 2: (10L, 6M, 1H) Shift 3: (3L, 0M, 2H) Proposed solution Evaluate feasibility of proposed solution relative to multiple optimization constraints Simulation is used to evaluate comple constraints (e.g., SLA attainment) Scatter search and tabu search used to propose net solution for evaluation If deemed feasible, total cost of proposed solution is evaluated to determine improvement. If no improvement, proposed solution is rejected. If improvement, proposed solution becomes new best solution 15 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Convergence Process Candidate solutions Number of Agents Best feasible solutions Convergence from 40 agents to 32 in 100 iterations Number of Iterations 16 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden
Conclusions and Future Work Our approach Closed-loop performance management for service workload prioritization Staffing optimization in service delivery systems Benefits Simple and effective approaches that is accurate and scalable for global deployment Future work Continue growing our use of analytical approaches for service quality improvement 17 2011 Lund Workshop on Control of Computing Systems, Lund University, Sweden