ATHABASCA UNIVERSITY AN INTELLIGENT AGENT-BASED APPROACH TO NETWORK MANAGEMENT. Munir Ahmad. MASTER OF SCIENCE in INFORMATION SYSTEMS

Size: px
Start display at page:

Download "ATHABASCA UNIVERSITY AN INTELLIGENT AGENT-BASED APPROACH TO NETWORK MANAGEMENT. Munir Ahmad. MASTER OF SCIENCE in INFORMATION SYSTEMS"

Transcription

1 ATHABASCA UNIVERSITY AN INTELLIGENT AGENT-BASED APPROACH TO NETWORK MANAGEMENT BY Munir Ahmad. An essay submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in INFORMATION SYSTEMS Athabasca, Alberta April 2015 Munir Ahmad. 2015

2 DEDICATION This essay is dedicated to my wife, Yasamin for being my personal cheerleader who has been a constant source of support and encouragement during the challenges of graduate school, work and life. This work is also dedicated to my parents, for their unconditional love, support and encouragement. Who have taught me to work hard for the things that I aspire to achieve. 2

3 ABSTRACT Multi-agent systems (MAS) are becoming more dominant in the world of information systems. Our day-to-day life is increasingly influenced by the rapid growth of complex interconnected network based systems. Complex interconnected network based systems with effective event and fault management increases network uptime improving customer experience. The ever-increasing expectation for high availability systems and competitive pressure motivates service provides to look for new ways to managing their infrastructure. It cannot be disputed that the industry needs to adapt some form of automated fault management in order to continue providing high availability self-healing networks. Agentbased technology is a powerful technology for the deployment of distributed systems in a dynamic environments. Agent-based approach to network management is particularly suitable for complex, dynamic and interconnected networks. This essay will extensively and thoroughly review current research, and effectiveness of multi-agent systems in network and server incident management focusing in computer networks operated by Internet Service Providers (ISPs) to deliver services such as internet, video and voice. Based on the review, the problems of the existing methods and approaches will be identified. This essay concludes by making recommendation to solve the problems identified for such systems. 3

4 ACKNOWLEDGMENTS First and foremost, I would like to express my sincere gratitude to my advisor Dr. Fuhua Lin for the useful comments, remarks and engagement through the learning process of multiagent systems and this master's essay. Also, I would like to take this opportunity to thank faculty and staff at Athabasca University s School of Computing and Information Systems for all their support during my graduate studies. Last but not the least I would like to thank my family, friends and colleagues for their support, both by keeping me harmonious and encouraging me with their best wishes. 4

5 TABLE OF CONTENTS Abstract 3 Acknowledgements 4 Table of Contents 5 List of Tables 6 List of Figures 7 Chapter 1 Introduction 8 Chapter 2 Review of Related Literature 13 Chapter 3 Methodology 20 Chapter 4 Design 25 Chapter 5 Conclusions and Future Research 43 References 46 5

6 LIST OF TABLES 1. Scenarios Initial Goals Detailed Goals Functionality Descriptors Precepts Actions Event Functionality Resource Functionality Ticketing System Functionality Administrative Functionality 35 6

7 LIST OF FIGURES 1. High Level Fault Detection, Treatment, and Resolution Workflow High Level Event Detection, Treatment, and Resolution Process System Overview Diagram Data Coupling Diagram Agent Acquaintance Diagram Interaction diagram when a new incident arrives Interaction diagram when a new incident arrives requiring escalation Main Interaction Protocol Final System Overview Diagram 40 7

8 CHAPTER 1 INTRODUCTION The focus of this essay is on the research into the use of multi-agent systems (MAS) for network incident management. Due to the lack of current research in use of MAS for network incident management, this essay draws on the use of MAS to automated power system restoration. The main contribution of this essay is the design of a multi-agent system for network incident management. The suggested method is aimed at reducing manual tasks by automating incident correction and in turn improving the network stability. A multi-agent system is composed of a set of agents in an environment collaborating with each other to solve a problem. An agent is an autonomous entity and is at least partially independent such as a process, a robot, a human, etc. [1]. In recent years MAS have been an active research field. There are many advantages for multi-agent based systems to managing computer networks and data centres, including real time incident isolation and correction minimizing meantime to recovery as well as improving productivity by automating manual tasks [2]. Current network management systems perform basic tasks in response to an incident and greatly rely on human intervention to correct faults [3]. In this decade the significant improvements in semantic technology has been used to improve decision making process over time [4]. MAS along with semantic technologies is a great way to reduce cognitive load on systems decision making by promoting collaborative problem solving among systems called agents. Agents are capable of reasoning and interacting with their environment, making decisions based on their beliefs, desires and intents [1]. The agents interact with each other, 8

9 engaging in cooperative decision making. This is different from a typical computer program which has a very structured and rigid process for interaction. This difference allows for the application of MAS into simulations that closely represent real life scenarios. Network monitoring is essential for a view of network health. It also allows network operations center (NOC) to observe changes in the behavior of elements. Network management has become increasingly distributed due to complexity requiring each area requiring highly specialized teams [5]. The distribution of the task among specialized teams leads to the point of bringing intelligent agents to delegate tasks, reducing repeat tasks performed by subject matter experts (SMEs). The monitoring process can be defined as the process of collection, interpretation, and presentation information concerning objects or software processes [6]. Some of the common areas of network monitoring include configuration, fault, accounting, and security management. As part of monitoring, system behavior is observed and information is collected and then used to make decisions. There are many systems and processes (e.g. servers, network devices, software process, etc.) that would require monitoring. This essay only focuses on network monitoring. The object status may change. Each change is considered an event. An event can trigger other events such as thresholds being reached, or failure on a secondary node might trigger a critical notification to raise awareness to the issue or simply produce a notification report to senior management advising on penitential for an outages or service degradations. The model shown in Figure 1 shows main activities involved when a network device is experiencing a problem such as failure of one or more components. 9

10 1. Incident detection: monitoring system receives notification 2. Incident treatment: validation, correlation, filtration, etc. 3. Presentation: information is presented to users in an appropriate form. 4. Resolution: necessary action is performed to resolve the issue. Figure 1: High Level Fault Detection, Treatment, and Resolution Workflow. As part of this essay, an intelligent agent based network management architecture is proposed and a proof of concept is designed. The objective of this proof of concept is to 10

11 show the possibilities of intelligent agent based network management while relying on current fault management systems for information gathering. Therefore this essay will not focus on how the faults are received by agents rather faults are treated as a data source. Based on the type of event triggered it may not require any actions to be performed we will refer to such event as notification event. Other events may require action(s) and in order to recover from the state by which the event was triggered throughout this essay such events will be referred as incident or fault interchangeably. The action(s) vary greatly depending on the incident and one detailed in Figure 1 requires a remote validation and intervention in order to be corrected. Reducing intervention where a technician is physically present on site can be time consuming and is critical to economic efficiency therefore remote troubleshooting is introduced which is ideal role for an intelligent agent. Deployment of a device specific agent which would collaborate with other agents to perform its task makes the use of agent technology for such tasks scalable, where new agents can be created as new devices are brought in to be managed [7]. The process of network management is highly dynamic, being impacted both by internal and external factors. The impacts ranging from unexpected high traffic volume to network outages caused by loss of major components, to external factors such as weather conditions impacting transmission towers. The dynamic nature of computer networks and the way networks are managed makes MAS an ideal candidate. If a major network outage occurs and several device need to be recovered and restored it may take network technicians and subject matter experts hours or days using manual process to fully recover. On the other hand, using an intelligent agent technology the recovery time may be greatly reduced by agents performing the recovery activity. 11

12 The agent roles that are involved in network incident management are shown in Figure 2. Figure 2: High Level Event Detection, Treatment, and Resolution Process The process of event management shown in Figure 2 may involve all or of some the resources such as network operations technician, tier 2 technician, field technician, subject matter expert and other resources may be involved to aid the resolution of an issue such as vendor support. Each of the roles described provide necessary skills for the above tasks. The decisions taken by each role and knowledge level about past similar experiences may impact the resolution time [8]. The uncertainty in the ability of the right resources being available during this network incident impacts time of resolution and in order to overcome such scenarios by having agent resources of all skill levels available at all times will address this concern. 12

13 CHAPTER II LITERATURE REVIEW A complete Fault Management System has functions to detect, isolate, and correct incidents in a telecommunications network, however the incident correction functionality is not easily achieved with existing system [9]. One of the major obstacles in archiving incident auto correction is the lack of a common platform to interact with network devices. Other challenges include complexity of networks and variety of devices requiring management [9]. The ever-increasing expectation for high availability systems and competitive pressure motivates service provides to look for new ways to managing their infrastructure. MAS have been used successfully for variety of applications ranging from computer games to transportation and logistics. There was significant amount of research available on MAS s use for managing power distribution systems where agent technologies are used for fault isolation as well as automated restoration [10]. Several articles related to MAS were researched and contribute to this essay. It is important to note as the result of review of related literature it was discovered that fewer recent academic research in the area of network incident management using agent technologies exist. However there are some articles advocating for self-healing networks using agent technologies through modeling network devices as the agents [11]. The aim of this essay is to extend and show how MAS can be used for network event management in order to increase network availability by automating fault correction remote agents. This approach allows service providers to take advantage of agent technologies without having to fully upgrade their network devices. In order to effectively use MAS for network event management it is important to first explore MAS architecture. 13

14 As introduced in Chapter 1 an agent is an autonomous entity and is at least partially independent such as a process, a robot, a human, etc. This paper will not attempt to focus on one or another definition but will look at an agent as simply a software entity that responds to changes in an environment and is responsible for representing user interests. Agents possess an internal state and make decisions based on perceptions as opposed to someone else telling it what to do [12]. The fundamental properties of an intelligent agent and its main components are defined during agent s design phase. In this essay an agent is autonomous, can communicate, can cooperate and delegate tasks to other agents. Before looking into MAS it is appropriate to briefly explore agents and their properties. There are four classes of agents and it is important to differentiate and select the most appropriate agent during the design phase based on agent functions and system requirements [13-17]: 1. Logic based agents 2. Reactive agents 3. Belief Desire Intention (BDI) 4. Layered Architecture In this essay agents with proactive components are also referred as intelligent agents, such as the subject matter expert (SME) agent. SME agents can be developed to perform complex decision-making and are dispatched after reactive agents are unable to correct incident(s). It cannot be disputed that some form of agent coordination such as Contract Net 14

15 Protocol (CNP) [18], Voting [19] will be necessary in order to distribute tasks to appropriate agents efficiently. However it is important to note, accuracy of the resolution should be a priority as it vital to the health of the network. Multi-agent Systems Multi-agent systems enable the resolution of complex problems by dividing into sub problems [20]. Each agent specializing in a specific task, complex problems can be solved efficiently and distributed model works well. Agents in a system may operate on their own and or pursue common interests. The implementation of these features is made possible by MAS communication infrastructure allowing communication and cooperation among agents. Communication is one of the key aspects of a multi agent system and is geared to the human language [14]. It allows agent communication and cooperation among themselves using communication protocols and an evolving language [21]. Agents may sent messages to specific agent or may broadcast the messages to the agent community it is also possible to narrowcast the message to a specific group of agents. [22] Results of a study on A Cooperative Multiagent Framework for Self-Healing Mechanisms in Distribution Systems shows that two-way communication between agents provides a good solution for fault isolation and effective restoration plan [21]. The two way communication will play an important role in validating current device state within the network as the device states change an alternative plan may be necessary. Computer networks are increasingly complex, dynamic with variety of devices interconnected to provide services across platforms, such as delivery of traditional television service to mobile devices. Internet Service Providers (ISPs) are providing a range of new 15

16 services as response to customer demand [24]. Over the last decade the increase in mobile technology and delivery of variety of services over mobile networks has been an enormous change for traditional ISPs. Delivery of new and existing service over a variety of platforms requires regress process for managing such services. As part of effective management of networks it is important to become aware of problem before or as soon as they occur. It is equally important to isolate and resolve such problems without impacting services and customer experience. The impact to end-user is greatly reduced by having redundancy in parts of the system however not all systems are redundant for various reasons including cost. Utilizing traditional operational support systems for the purpose of fault management is not optimal due to manual human interventions for resolution where automation would greatly reduce the resolution and decrease manual repeat tasks. In order to resolve critical issues more quickly subject matter experts are engaged promptly after minimal effort to correcting issues resulting in high workload for subject matter experts. The amount of repeat tasks for subject matter experts are reduced by training tier 1 and 2 technicians to assist with some of the common tasks. As such a solution to automate such tasks performed by tier 1 and 2 technicians would greatly reduce the manual human intervention to correcting such issues. However in order to so in a complex environment such computer networks operated by ISPs today would require a framework which would allow SMEs to develop procedures to be performed in event of issues. MAS and artificial intelligence techniques proposed for automated wireless intrusion detection to reduce human intermediation [25]. Automation has been adopted by many industries including automate fault diagnosis and correction on servers where a number of Operational Support Systems (OSS) exist specifically targeting automated fault correction for servers [26]. Although automation is archived through a client 16

17 and server application server management has greatly benefited from such tools allowing server administrators monitor and perform automated action based on predefined action reducing manual and repeat tasks as result increasing server stability [5]. However traditional ISPs with variety of network devices are unable to take advantage of such existing systems, due to the architecture of such fault management systems requiring an agent application to be locally installed on the server [26]. Today s networks are made up of several different devices with fewer commonalities and varying proprietary operating systems. In some cases network devices are treated as appliances where ISPs are not permitted to install third party application purposes such as monitoring. On the other hand severs management success comes mainly from the commonalities of the base operating systems where there are a handful of most commonly used operating systems making it appropriate for vendors of fault management systems to develop client applications. Locally installed agent clients on servers for the purpose of monitoring various aspects of the servers such as processes, physical components and performance related activities, also allows server administrators to define automated actions to be performed [26]. Due to the lack current research in use of MAS for the purpose of network incident management no case studies were found to indicate agent cooperation in resolving network issues would better compared to current practice. However case studies presented on the use of MAS for the purpose of automated power system restoration show successfully power restoration with minimal impact to the end user [26-28]. Authors of Conceptual Design of A Multi-Agent System for Interconnected Power Systems Restoration using simulation have demonstrated use of agent technology to aid power restoration. The authors investigated three cases of power system failures. The first failure case had minimal impact. In the second 17

18 scenario one of the generators failed where load balancing was required. The third case involved both buses having faults concurrently. In all the three cases multi-agent systems approach was effective in restoring service [28]. Gulnara Zhabelova and Valeriy Vyatkin (2012) outline in an article published by IEEE on how multi agent systems can be used to achieve self healing power grids using collaborative fault isolation and restoration [27]. The proposed approach automatically senses the fault in power system after correlation and isolation of the problem it then switches customer to non-faulted section of the system [27]. Other research on intelligent energy systems indicate the importance of monitoring the performance in addition to fault monitoring as a way to optimal network configuration during a failure to support self healing functionality [29]. IBM researchers have demonstrated using practical scenarios how agent technology can help manage power consumption using a medium sized server cluster [30]. Effective network management is essential to the success of ISPs as networks are the backbones for the services they provide [5]. As this essay has discovered multi agent systems have been effectively used for managing fault and automated fault restoration in power systems. There are a number of reasons why MAS is a great application for managing distributed systems, such as agent communication and negotiation when solving problems collaboratively, other notable features including agent adaptability in dynamic environments [23]. Today s computer networks are dynamic and becoming even more dynamic as additional services are being offered using the same infrastructure [31]. Reports indicate number of Internet connected devices is on the rise and by the end of 2020 it is estimated to reach 75 billion connected device which translates to nine connected device per person [31]. 18

19 The growth in the Internet of Thing makes the case for automated fault correction towards achieving self healing networks. As already mentioned the need to automate fault correction is not unique to computer networks. Automated network fault correction has a lot of similarities with modern smart power grids [32]. ISPs can leverage success in other areas such as ones explored in this essay on power systems leveraging multi agent systems for the purpose of monitoring and auto resolution of network related issues. The research results suggest that the multi agent systems approach with agent cooperation and autonomy is a major advantage over conventional systems [32-33]. Existing methods of fault management extensively relies on human intervention and with the expected rapid growth in Internet of things makes multi agent systems a sustainable long term solution for automating fault management in computer networks [33]. Finally MAS has been proven to work well in dynamic environments, as is the case with computer networks. 19

20 CHAPTER III METHODOLOGY This essay presents design of a multi-agent system aimed at network management, specifically of automated incident management. High-level diagram are used to describe the process for identifying and correcting a network fault. The overall system should be designed to be open, to allow ISPs add new agents quickly as new devices are introduced into the network. 1. Device one or more device of a type may exist and each device is responsible for a given task. Devices within an Internet Service Provider (ISP) may range from network devices to billing systems. ISPs may configure their device to trigger notification event when of parts or entire device is at fault. An event or fault occurs as result of several factors such as hardware/software problem, capacity, intrusion, etc. 2. Fault Management System receives notification as indication of an event such as fault. The notification is delivered in several forms through Element Management System (EMS) or directly from the end device. If no EMS is present and device is incapable of sending a notification the device may be polled periodically for logs or other information by fault management system or through a custom middleware. Event validation, correlation, and filtration may be performed in order to reduce some of the work for NOC Administrators. 3. NOC Administrator usually performs tier one tasks for events received by creating a trouble ticket and dispatch to appropriate queue usually based on 20

21 technology or service. In ideal scenario event severity is used to determine if immediate attention is required, if so appropriate technician(s) are contacted by phone to investigate the issue. 4. Technician perform tier two role by investigating issues and may attempt steps necessary to resolve the problem. Technicians are classified by their roles and responsibility such as network, cable technician, etc., and may perform tasks remotely. 5. Subject Matter Expert perform tier three role when it comes to troubleshooting and resolution. SMEs specialize in a specific area/technology and are highly trained for systems they support. 6. Incident Manager gathers information related to a given incident to study the impact and may send an notification to service advisory distribution indicating the issue and its impact to customers. The notification may contain number of customers impacted and if business customers are impacted customer names may be listed. Incident Manager would work with NOC and field technicians closely to provide subsequent updates. 7. Vendor Technical Support unresolved issues may be escalated to vendor support to investigate and resolve such issues. Vendor support may refer the issue within their design teams in order to resolve the issue. Usually issues escalated to vendors are complex and is done after in house SMEs are unable to resolve the issue. 8. Business Intelligence Analyst responsible for finding repeat problems based on problem history by analyzing network event data. Usually the feedback is sent to 21

22 subject matter experts and/or engineering to find root cause and/or find alternative products depending on the outcome of the root cause analysis. This process in turn may trigger other processes that are not of concern for the purpose of this essay. During the design stage the roles detailed in above steps will be mapped into one or more agents responsible for performing the role of the stakeholders involved. In order to bring network events into multi agent system it will be necessary to create one or more fault management system agent(s) responsible for retrieving event information and communicate to appropriate device agent(s) for further processing. Event correlation logic may be in place to discard notification events that do not require any action. Since the aim of this essay is to outline how an agent can be designed for an existing device and be responsible for its management. It is necessary to create an agent for each device type in order to keep these technician agents as simple as possible whereas in real world a technician may be trained for more than one device. Technician agents will communicate with the network device of their type and perform actions defined by human subject matter expert when an event is triggered therefore device agents can be reactive agents. There will be one or more subject matter expert (SME) agents responsible for a device type similar to their technician agent counterparts SME agents can have a reactive component as well as BDI component working towards long term goal of the network stability. The balance between its reactive and BDI component is dependent to the network device they are responsible for managing. The Incident Manager (IM) agents can be mapped into logic based agents. For example their decision to send notification on service outages or customer impacts caused by a fault in the network can be based on percentage/number of customers impacted. The role of 22

23 the Vendor Technical Support agents would be limited given today s method of communication with vendor support to submit trouble tickets that has to remain as a manual process. However it is important to note that if vendor-ticketing systems are robust and system generated tickets then creating an agent to automate the ticketing process may be feasible. The role of the Business Intelligence Analyst agent is to dynamically analyze and report repeat patterns of failure and the information collected can be used by engineers to improve the systems design in order to prevent repeat failures. The Prometheus methodology is used to design the system. Prometheus methodology is a detailed procedure for specification, design and implementation of intelligent agent systems [34]. The Royal Melbourne Institute of Technology (RMIT) in close collaboration with Agent Oriented Software Pty. Ltd developed Prometheus methodology. The company sells JACK Intelligent Agents a commercial agent platform. Prometheus development process consists of three major phases: 1. Specification of the system involves with identification of goals and sub-goals of the systems. System specification phase describes how the agents will interact with the environment. Agent interactions with the environment is called actions and are to be distinguished from incidents (events). The goals describes the task of system to be solved and the functionalities are description of individual functions by which this occurs 23

24 2. In the architectural design phase numerous diagrams are created to specify structure of the system. Agents are identified as well as what events they have to react and actions they need to perform to effect the environment. 3. The detailed design phase looks at the internal workings of each agent in terms of capabilities, events, data structures, and plans [34]. Agents are plan based and rely on user-defined plans; which is a major advantage for network management. Agent plans are defined based on devices they managed. In this phase all events are defined including external, between agents and within agents. The result of this essay is design of a multi-agent based system aiming for automated incident isolation and correction. 24

25 CHAPTER IV DESIGN Prometheus Methodology is following in this chapter for system specification, high level architectural design and detailed design. The system specification includes identification of agents, the initial goal and functionality descriptors. In the high level architectural design stage the overall system structure is describes using system overview diagram whereas detailed design stage is concerned with internals of each agent [36]. Prometheus is an iterative methodology aimed at design and development of intelligent agents [35]. More information on Prometheus Methodology can be found in Chapter III and reference section. As previously mentioned in Chapter III computer networks have become complex with the recent addition of new services from video on demand to next generation home security [37]. Due to the complexity of networks and the variety of problems they may experience the focus of this chapter will be limited to one specific incident. The incident outlined in Figure 1 is of a primary network switch experiencing a problem with one of its physical links and there are three virtual interfaces connected to different routers. The network events generated as result of this problem are listed below: Primary switch sends a notification indicating failure in one its links. Secondary switch sends a notification indicating loss of its peer. The three routers each send a notification indicating loss of primary link. 25

26 The system actors are NOC Administrator, Technician, Subject Matter Expert (SME), Incident Manager, Vendor Technical Support, and Business Intelligence Analyst. In this section we can identify the tasks associated with each of the actors: Table 1: Scenarios NOC Administrator tasks Collects information about the device Finds appropriate dispatch procedure Creates an incident ticket and dispatches to appropriate queue Pages prime technician if the event occurs outside working hours of the team and if the event requires immediate attention based on information gathered Technician tasks Accesses ticketing system for new work assigned Attempts to identify problems with the device or service Attempts to resolve the problem If unable to identify the cause or unsuccessful in resolving the issue contacts device subject matter expert Updates ticking system Subject Matter Expert tasks Accesses ticketing system for new work assigned Reviews procedures followed by the technician agent If unable to identify the cause or unsuccessful in resolving the issue contacts device vendor technical support Collaborates with vendor support team to provide access to the device or send diagnostics data Incident Manager tasks Accesses ticketing system periodically for new incidents Accesses ticketing system periodically for changes to existing open tickets Asses situation by identifying customer and service impact Send notification to stakeholders Escalate the incidents that do not get resolved quickly Vendor Technical Support tasks Review ticketing system for customer tickets Attempt to assess the issue using information provided 26

27 Communicate with customer to perform a procedure and or ask for more details Communicate internally with other teams Business Intelligence Analyst tasks Analyze ticketing system and fault history Isolate devices with repeat patterns of problems Communicate the results of findings with stakeholders In this section tasks outlined in Table 1 for each actor can be presented as use cases. The use case is Use Case 1: A network device loses one its communication links Fault management system receives event notifications via SNMP traps from multiple devices. Fault management system correlates events based on some predefined logic to isolate the issue with primary switch. A NOC agent is created that will respond to the event by gathering device information from different sources such destination queue. The NOC agent s goal is to identify a device technician agent. A device agent is notified by the NOC agent to look into the issues. If event correlation was not done by the fault management system the NOC agent will notify a device agent for each and every device from which a notification was received. 27

28 Device agent receiving the notification attempts to identify the root cause of the issue. The device agent after verifying interface card it is able to conclude the issue is not as result of a component failure. With root cause of the issue unknown device agent notifies Incident Manager Agent to issue a threat advisory. Incident Manager Agent is created and continues to issue periodic updates until the issue is resolved at which point a final update is sent. Based on the scenarios identified we can determine the system goal. The initial goals of the system are only based on Use Case 1. Table 2: Initial Goals Receive a Notification Correlate Events Assign a Technician Troubleshoot an Issue Assign SME Update the Ticketing System Send Service Advisory Notification Escalate unresolved issue Identify repeat problem From initial goals we can identify additional sub goals. Table 3: Detailed Goals Receive a Notification Keep track of active incidents Update list of active incidents Archive resolved incidents 28

29 Correlate Events List active events Look for relationship between events Group events based on relationship Present a few events that are really important Assign a Technician List Technicians Look for an appropriate technician Provide event and ticket details Notify for device status changes Keep track of active technicians Troubleshoot an Incidents List active incidents Search for resolutions procedure Select resolution procedure Attempt resolution procedure Confirm resolution Update ticketing system Assign SME List SMEs Look for an appropriate SME Provide event and ticket details Review attempted steps performed by technician Notify for device status changes Keep track of active SMEs Update the Ticketing System Search using ticket number View event details Update ticket with steps performed for issue resolution Change ticket status Send Service Advisory Notification Search for active tickets Filter on ticket severity requiring service advisory notification Associate services/users impact caused by event Distribute findings Escalate Unresolved Incident 29

30 Search for active tickets View service-level agreements(slas) Escalate issue to meet SLAs Update ticketing system Identify repeat problem Search for archived events Group events by device and root cause Identify repeat problems Calculate resolution costs Send period reports The goals identified in Table 3 can be grouped by functions removing repeat functions. The groupings listed in Table 4 describe system functionality. Table 4: Functionality Descriptors Event Function Accept new events Keep track of active events Update event details based on new events and actions Look for relationship between events Group events based on relationship Present a few events that are really important Resource Function List resources Look for an appropriate resource Provide event and ticket details Notify for device status changes Keep track of active resources Search for resolutions procedure Select resolution procedure Attempt resolution procedure Confirm resolution Ticketing System Function 30

31 Search using ticket number Search using device name View event details Update ticket notes Change ticket status Administrative Function Search for active tickets View service-level agreements(slas) Escalate issue to meet SLAs Search for archived events Filter on ticket severity requiring service advisory notification Associate services/users impact caused by incident Group events by device and root cause Identify repeat problems Calculate resolution costs Send period reports The precepts intelligent agents receive from external environment play an important role in network management. The following precepts are identified: Table 5: Precepts Receive new events The System receives new events to be added to the list of active events and subsequently be dispatch to a resource for troubleshooting. Receive updates for existing events The system receives updates for active events from network devices. The updates are forwarded to the assigned agents. Ticket closures The system receives notification of a ticket closure by human users after performing activities to correct issues such as replacing a power supply. Problem escalations 31

32 The system receives notification if issues are not resolved within a defined period time based on severity. Power outages The system receives notification when a central office switches power source to battery/generator. Service outages The system receives notifications when there are service outages. In this section of the design we can define actions. Actions are opposite of precepts and such as communication from intelligent agents with external environment. Actions of the system are defined in Table 6. Table 6: Actions Resolve device issues The overall purpose of the system is to resolve network issues. Receive event notifications A notification is received by the system indicating a problem with a device. Request device related information Device information such as credentials are accessed. Troubleshoot issues Establish connection with the device and identify issues. Run commands Agents run specified commands in an attempt to resolve issues. 32

33 The initial functionality descriptors provide an abstract view of the system by combining precepts, actions and goals. The system performs functionalities listed in Tables 7 10 with others such as agents, humans, or other systems. Table 7: Event Functionality Functionality Event Description This functionality manages the list of active events. Resources are dispatched to troubleshoot and resolve incidents reported using the event. Events reported to resources are tracked using ticketing system. Goals Accept new events, Keep track of active events, Update incident details based on new events and actions, Look for relationship between events, Group events based on relationship, Present a few events that are really important Actions Triggers Information Used Information Produced Assign a resource New event, Update to an existing active event Event description, severity, device name, type. Assign a resource Table 8: Resource Functionality Functionality Resource Description This functionality manages various resources such as technicians, subject matter experts (SME). Resources are assign to troubleshoot and resolve issues. Technician agent can assign an incident to SME agent when unsuccessful in resolving the issue. 33

34 Goals List resources, Look for an appropriate resource, Provide event and ticket details, Notify for device status changes, Keep track of active resources, Search for resolutions procedure, Select resolution procedure, Attempt resolution procedure, Confirm resolution Actions Triggers Information Used Information Produced Troubleshoot, resolve issues Assign SME resource Event details, Device type Resolve issue, escalate issue Table 9: Ticketing System Functionality Functionality Ticketing System Description This functionality tracks event related activities such Goals Actions Triggers Information Used Information Produced as resource assigned and actions performed. The system identifies devices by name and provides device specific details. It also allow search of all tickets created for a specific device. If issues are not resolve within a specified time period sends notification. Search using ticket number, View event details, Search using device name. Update ticket notes, Change ticket status Escalate ticket Event details Event maintenance history 34

35 Table 10: Administrative Functionality Functionality Administrative Description This functionality keeps track of event related activity such as actions performed when a problem occurs. The resulting information is used by Business Intelligence agent to find repeat problems. Goals Search for active tickets, View service-level agreements (SLAs), Search for archived events, Filter on ticket severity requiring service advisory notification, Associate services/users impact caused by event, Group events by device and root cause, Identify repeat problems, Calculate resolution costs Actions Triggers Information used Information produced Report Escalate issue to meet SLAs, Send period reports Event, Device, and Resources data Report of device with repeat problems In the high-level architectural design stage of Prometheus; overall structure of the system is described using a system overview diagram. The System Overview Diagram is produced based on system specifications discussed so far in this chapter. 35

36 Figure 3: System Overview Diagram The data-coupling diagram presented in Figure 4 is based on the initial functionality descriptors and initial system overview diagram. 36

37 Figure 4: Data Coupling Diagram As part of the high-level architectural design stage of Prometheus, we can define which agents will exist within the system. Based on the system overview diagram there are six types of agents pictured in Figure 5: NOC Administrator Agent Technician Agent Incident Manager Agent Subject Matter Expert Agent Business Intelligence Analyst Agent Device Vendor Technical Support Agent 37

38 Figure 5: Agent Acquaintance Diagram Interaction diagram pictured in Figure 6 describes arrival of a new incident, which can be resolved by technician agent. Figure 6: Interaction diagram when a new incident arrives Interaction diagram shown in Figure 7 describes a scenario where an incident will require escalation to Subject Matter Expert as well as the Device Vendor Technical Support before being resolved. However it does not involved the Incident Manager Agent. 38

39 Figure 7: Interaction diagram when a new incident arrives requiring escalation These interaction diagrams presented in Figures 6 and 7 cover a basic scenario. For the purpose of this essay only a few interaction diagrams were produced. Depending on the complexity of network and devices to be managed there will be several additional interaction diagrams for each scenario. Agents must interact with each order to perform their tasks. Agent communication can be achieved through the communication protocol presented in Figure 8. The aim of a communication protocol is to allow agents synchronization and exchange of messages. Figure 8: Main Interaction Protocol As part of the details design of Prometheus System Overview Diagram can be produced using the previous diagrams and system specification. The System Overview Diagram is presented in Figure 9 where agent interactions between are made through the main communication protocol. It also shows each agent has actions, percepts and messages associated with it. 39

40 Figure 9: Final System Overview Diagram Multi-agent system approach can be applied for a variety of problems aimed at automating manual process. As previously explored in Chapter III MAS has been advocated as way to automating fault correction in power grid networks. The MAS approach for automated fault correction in computer networks has greater challenges compared to power grid networks due to the variety of services offered over these networks and which is considered as one to end users of such services. However as proposed in this essay multi 40

41 agent systems can be beneficial in automating some of the basic task performed by network technicians. There are several ways for identifying system improvements using Multi Agent System (MAS), companies may have their unique way to measure such improvements based on problems identified that led to look into solutions such as MAS. One way to measure network fault correction after using MAS is to compare the time it takes using the manual process for resolving similar incidents. However this is not ideal; the manual process involves a human to react to the incident which can have time delays on the other hand the automate fault correction using MAS would require the development of the procedures an agent would execute in response to an incident. Other benchmarks may include the cost savings by deployment of MAS where the savings would come from a combination of human resource reduction as well as network uptime by automated actions being performed to resolve incidents with near instant reaction from agents. Computation cost among agents allows tradeoff between time and solution quality, this can play an important role in network fault correction, where an agent makes a tradeoff between a lengthy procedure and a temporary yet quicker solution to restore service. Such decision making can help restore service as soon as possible and then agents focus in correcting the issue using the original (lengthy) procedure. An implementation of multi-agent system approach in automating network fault correction must focus on a specific devices type targeting a subset of fault where the correction process can be clearly identified and the steps performed to correct the fault is the same every time. As discovered in the design chapter agent communication is minimal and agents are relatively autonomous. This essay suggests a bottom-up methodology for 41

42 implementing agent based fault correction system this allows the system to be highly adaptable and scalable without relying on abstract representation. However when following bottom-up methodology it is important to keep in mind a high level design of the global system. 42

43 CHAPTER V CONCLUSIONS AND FUTURE RESEARCH The research results show that there is a great potential for agent technology to be used in dynamic environments and the general viability of MAS for automating manual tasks. Similarly network incident management using MAS approach presented in this essay shows the great potential for self healing networks. MAS offers several advantages mainly through agent autonomy and cooperation over conventional systems making it ideal for solving complex problems within a network using specialized agents focusing on a specific issue. The distribution of agents allows for a distributed network incident management, by having several agents working concurrently each with a focus area. This allows network managers to build and deploy new agents as they add new device types to their networks making MAS approach significantly scalable. Through integration of new technologies such as artificial intelligence and semantics; multi agent systems are increasingly flexible to deal with uncertainties in dynamic environments. These features will allow powerful new agent based applications to support the ever growing computer networks as well as allowing them to change agent internals to accommodate for device software upgrades or simply adding additional tasks. This is a great advantage in itself over a single system being responsible for managing all devices. It allows making changes to one specific device agent without disturbing others in turn minimizing the risks to the network. Other advantages include the ability to automate incident resolution minimizing manual tasks and reducing recovery times. However it is important to note multi agent systems lack a common conceptual and technological foundation for mass deployment in production environments. Future developments in MAS as well as the future growth in the Internet of things will promote 43

44 the notion around self healing networks where MAS can play an important role. Finally the methodology proposed in this essay is practical when used for basic automation as a starting point and an iterative approach for further improvements as network device behaviors become apparent. IMPLEMENTATION METHOD & FUTURE RESEARCH Research shows software agents and agent based systems have been an active research area in the past decade. However most agent based applications have focused on simulation and modeling issues. In order to verify the usefulness of multi-agent systems for the purpose of network incident management one must implement the solution with agents performing routine tasks on switches, routers and other network devices within a lab environment. First stage implementation of automated fault correction should include commonly reoccurring incidents by studying network fault data from current fault management system in use. Tests results using MAS approach should be compared to traditional fault management systems currently in use. The comparisons would support further implementation of multi-agent systems to automate additional incidents. The Prometheus Methodology can be used to design multi-agent systems fault management in multiple phases in order to reduce complexity of such implementations. Allowing companies to see the benefits of MAS as the system is being implemented across the network. Also allowing new discoveries to be incorporate to the system using iterative model to refine the design. Java Agent Development Framework (JADE) can be used as an implementation platform. 44

A technical discussion of performance and availability December IBM Tivoli Monitoring solutions for performance and availability

A technical discussion of performance and availability December IBM Tivoli Monitoring solutions for performance and availability December 2002 IBM Tivoli Monitoring solutions for performance and availability 2 Contents 2 Performance and availability monitoring 3 Tivoli Monitoring software 4 Resource models 6 Built-in intelligence

More information

Network maintenance evolution and best practices for NFV assurance October 2016

Network maintenance evolution and best practices for NFV assurance October 2016 Network maintenance evolution and best practices for NFV assurance October 2016 TECHNOLOGY BUSINESS RESEARCH, INC. 2 CONTENTS 3 Introduction: NFV transformation drives new network assurance strategies

More information

Enhancing Utility Outage Management System (OMS) Performance

Enhancing Utility Outage Management System (OMS) Performance Enhancing Utility Outage Management System (OMS) Performance by John Dirkman, P.E. Executive summary Traditional grid outage management systems suffer from two fundamental flaws: they lack an accurate,

More information

Service Goes Digital! A toolbox for acquiring digital capabilities for your service business

Service Goes Digital! A toolbox for acquiring digital capabilities for your service business Service Goes Digital! A toolbox for acquiring digital capabilities for your service business Service Goes Digital! A toolbox for acquiring digital capabilities for your service business Digitalization

More information

Implementing a Manager of Managers for Effective Fault Management of Public Safety Radio Networks

Implementing a Manager of Managers for Effective Fault Management of Public Safety Radio Networks Implementing a Manager of Managers for Effective Fault Management of Public Safety Radio Networks The case for a Manager of Managers Telecommunications systems today are complex and heterogeneous. Network

More information

MANAGED NOC AND HELP DESK SERVICES

MANAGED NOC AND HELP DESK SERVICES CALL US 1-800-238-6360 MANAGED NOC AND HELP DESK SERVICES A seamlessly integrated unit of your operations We provide you with a seamless experience of owning a Network Operations Center without actually

More information

The Hybrid Automation Revolution

The Hybrid Automation Revolution A I The Hybrid Automation Revolution Why 90% of Automation-Ready Processes Require a Hybrid Human-Robot Approach Sponsored by 1 Introduction RPA (robotic process automation) allows enterprises to reduce

More information

requirements, we developed an MNS foundation that is adaptable to different requirements for size, bandwidth, and complexity.

requirements, we developed an MNS foundation that is adaptable to different requirements for size, bandwidth, and complexity. General Services Administration (GSA) Enterprise Infrastructure Solutions (EIS) requirements, we developed an MNS foundation that is adaptable to different requirements for size, bandwidth, and complexity.

More information

New Solution Deployment: Best Practices White Paper

New Solution Deployment: Best Practices White Paper New Solution Deployment: Best Practices White Paper Document ID: 15113 Contents Introduction High Level Process Flow for Deploying New Solutions Solution Requirements Required Features or Services Performance

More information

GREAT SERVICE NEVER STOPS.

GREAT SERVICE NEVER STOPS. GREAT SERVICE NEVER STOPS. At Tata Communications, we understand that how we do things is every bit as important to our customers as the things that we do. So we re always flexible, always available, and

More information

IBM Infrastructure Security Services - Managed Security Information and Event Management (Managed SIEM)

IBM Infrastructure Security Services - Managed Security Information and Event Management (Managed SIEM) IBM Infrastructure Security Services - Managed Security Information and Event Management (Managed SIEM) DK_INTC-8838-00 11-2011 Page 1 of 17 Table of Contents 1.Scope of Services...3 2.Definitions...3

More information

SESSION 408 Tuesday, November 3, 10:00am - 11:00am Track: Service Support and Operations

SESSION 408 Tuesday, November 3, 10:00am - 11:00am Track: Service Support and Operations SESSION 408 Tuesday, November 3, 10:00am - 11:00am Track: Service Support and Operations Preventative Problem Management: What ITIL Didn't Teach You Gabriel Soreanu Solutions Architect, Cisco Systems gsoreanu@cisco.com

More information

Service management solutions White paper. Six steps toward assuring service availability and performance.

Service management solutions White paper. Six steps toward assuring service availability and performance. Service management solutions White paper Six steps toward assuring service availability and performance. March 2008 2 Contents 2 Overview 2 Challenges in assuring high service availability and performance

More information

IBM Emptoris Supplier Lifecycle Management on Cloud

IBM Emptoris Supplier Lifecycle Management on Cloud Service Description IBM Emptoris Supplier Lifecycle Management on Cloud This Service Description describes the Cloud Service IBM provides to Client. Client means the contracting party and its authorized

More information

Incident Management Process

Incident Management Process OSF Service Support Incident Management Process [Version 1.1] [From https://www.ok.gov/cio/documents/incidentmanagementprocess.doc] Incident Management Process Table of Contents About this document...

More information

Kaseya Traverse Predictive SLA Management and Monitoring

Kaseya Traverse Predictive SLA Management and Monitoring PRODUCT BRIEF Kaseya Traverse Predictive SLA Management and Monitoring Kaseya Traverse Traverse is a breakthrough cloud and service-level monitoring solution that provides real-time visibility into the

More information

The Appliance Based Approach for IT Infrastructure Management

The Appliance Based Approach for IT Infrastructure Management The Appliance Based Approach for IT Infrastructure Management This white paper examines the key issues faced by IT managers in managing the IT infrastructure of their organizations. A new solution using

More information

Optanix Platform The Technical Value: How it Works POSITION PAPER

Optanix Platform The Technical Value: How it Works POSITION PAPER Optanix Platform The Technical Value: How it Works POSITION PAPER Table of Contents The Optanix Clean Signal... 3 Active IT Managed Services... 4 Data Acquisition and Monitoring... 6 The Ingestion Engine...

More information

Service Operation. Scenario One

Service Operation. Scenario One Service Operation Scenario One A large corporation completed implementing a new IT service management framework last month and has selected its new service management tools. The new processes of the framework

More information

IT Event Alerting and Response

IT Event Alerting and Response TelAlert IT Event Alerting and Response Information technology is critical to business and downtime means lost revenue. Maximize uptime with advanced two-way notification built to integrate with IT service

More information

Services. Dell ProSupport. Improve productivity and optimize resources with efficient, flexible, and reliable support

Services. Dell ProSupport. Improve productivity and optimize resources with efficient, flexible, and reliable support Services Dell ProSupport TM Improve productivity and optimize resources with efficient, flexible, and reliable support Shift your resources from maintenance to momentum Dell s award-winning support can

More information

Optimizing Service Assurance with Vitria Operational Intelligence

Optimizing Service Assurance with Vitria Operational Intelligence S o l u t i o n O v e r v i e w > Optimizing Service Assurance with Vitria Operational Intelligence 1 Table of Contents 1 Executive Overview 1 Value of Operational Intelligence for Network Service Assurance

More information

Business Case Overview

Business Case Overview 2018 TM Forum 1 Business Case Overview 2018 TM Forum 2 2018 TM Forum 2 Business Case from Orange Enhancing CX for Orange Livebox customers Orange Livebox is an ADSL/Fiber Optics wireless router available

More information

Uptime Maintenance and Support Services - Appendix. Dimension Data Australia Pty Limited. Uptime Support Services Agreement

Uptime Maintenance and Support Services - Appendix. Dimension Data Australia Pty Limited. Uptime Support Services Agreement Uptime Support Services Agreement Uptime Maintenance and Support Services - Appendix Dimension Data Australia Pty Limited 27 May 2013 Version 1-01 Appendix A. 1. Definitions and Interpretations 1.1 For

More information

Service Operation. Scenario One

Service Operation. Scenario One Service Operation Scenario One A large corporation completed implementing a new IT service management framework last month and has selected its new service management tools. The new processes of the framework

More information

AppManager + Operations Center

AppManager + Operations Center AppManager + Operations Center A Powerful Combination for Attaining Service Performance and Availability Objectives This paper describes an end-to-end management solution for essential business services

More information

IT Sample Duties and Responsibilities Statements BAND A POSITION CONCEPT: ENTRY / INTERMEDIATE / INDEPENDENT WORKER

IT Sample Duties and Responsibilities Statements BAND A POSITION CONCEPT: ENTRY / INTERMEDIATE / INDEPENDENT WORKER Multi-user System Administration Systems & Services Administration Installs, tests, implements, monitors, tunes, and maintains all related software products Rack-mounts servers and installs server hardware

More information

JOB DESCRIPTION. The Subject Position has no responsibility for ongoing and sustained supervision of other staff.

JOB DESCRIPTION. The Subject Position has no responsibility for ongoing and sustained supervision of other staff. JOB DESCRIPTION I. JOB IDENTIFICATION Position Title: Business Analyst Job Code: NEW Position Number: Various Linguistic Profile: BBC Group and Level: ADG E Supervisor Title: Chief, Business and Technology

More information

Business Process Framework (etom) Release Self-Assessment Process Mapping Report Level 2 Process: Problem Handling

Business Process Framework (etom) Release Self-Assessment Process Mapping Report Level 2 Process: Problem Handling Huawei Tech. Co., Ltd Digital CRM R2.1 TM Forum Frameworx 16.0 Certification Business Process Framework (etom) Release 16.0 Self-Assessment Process Mapping Report Level 2 Process: 1.3.7 - Problem Handling

More information

SUPPORT SERVICES FOR DELL EMC VXBLOCK SYSTEMS, VBLOCK SYSTEMS, AND VXRACK SYSTEMS

SUPPORT SERVICES FOR DELL EMC VXBLOCK SYSTEMS, VBLOCK SYSTEMS, AND VXRACK SYSTEMS SUPPORT OVERVIEW SUPPORT SERVICES FOR DELL EMC VXBLOCK SYSTEMS, VBLOCK SYSTEMS, AND VXRACK SYSTEMS Dell EMC provides a range of support options to match your business objectives and preferred support experience.

More information

Customer Advocacy: Maintain a Customer Focus Throughout the Incident Management Process. Julie L. Mohr

Customer Advocacy: Maintain a Customer Focus Throughout the Incident Management Process. Julie L. Mohr Customer Advocacy: Maintain a Customer Focus Throughout the Incident Management Process By Julie L. Mohr The IT organization exists to efficiently provision and support technology to meet business objectives.

More information

Data Protection Management (DPM)

Data Protection Management (DPM) Industry Trends and Technology Perspective White Paper Data Protection Management (DPM) A look at the benefits of DPM for timely and effective data protection management By Greg Schulz Founder and Senior

More information

Incident Management Process

Incident Management Process Incident Management Process TABLE OF CONTENTS Incident Management Process... 1 Chapter 1. Incident Process... 1 1.1. Primary goal... 1 1.2. Process Definition:... 1 1.3. Objectives - Provide a consistent

More information

Tivoli software for the midsize business. Increase efficiency and productivity manage IT with fewer resources.

Tivoli software for the midsize business. Increase efficiency and productivity manage IT with fewer resources. Tivoli software for the midsize business Increase efficiency and productivity manage IT with fewer resources. The on demand world The world has entered a new era in business the e-business on demand era.

More information

What Do You Need to Ensure a Successful Transition to IoT?

What Do You Need to Ensure a Successful Transition to IoT? What Do You Need to Ensure a Successful Transition to IoT? As the business climate grows ever more competitive, industrial companies are looking to the Internet of Things (IoT) to provide the business

More information

BACSOFT IOT PLATFORM: A COMPLETE SOLUTION FOR ADVANCED IOT AND M2M APPLICATIONS

BACSOFT IOT PLATFORM: A COMPLETE SOLUTION FOR ADVANCED IOT AND M2M APPLICATIONS BACSOFT IOT PLATFORM: A COMPLETE SOLUTION FOR ADVANCED IOT AND M2M APPLICATIONS What Do You Need to Ensure a Successful Transition to IoT? As the business climate grows ever more competitive, industrial

More information

Introduction to Business Cloud. White Paper

Introduction to Business Cloud. White Paper Introduction to Business Cloud White Paper Contents Introduction... 3 Background... 3 Solution... 4 Out-of-the-band Architecture... 5 Scalable... 6 Simple... 6 Secure... 7 Intelligent... 7 Intended Market

More information

Service Differentiation: Your 3-Step Plan. Differentiation, Service DNA, and Continuous Improvement in Field Service

Service Differentiation: Your 3-Step Plan. Differentiation, Service DNA, and Continuous Improvement in Field Service Service Differentiation: Your 3-Step Plan Differentiation, Service DNA, and Continuous Improvement in Field Service Introduction This white paper discusses service differentiation: doing more with less,

More information

The Business Case for Unified IT: Automated IT Service and Unified Endpoint Management Solution

The Business Case for Unified IT: Automated IT Service and Unified Endpoint Management Solution The Business Case for Unified IT: Automated IT Service and Unified Endpoint Management Solution The Business Case for Unified IT: Automated IT Service and Unified Endpoint Management Solution An ROI White

More information

Change Management Process

Change Management Process Change Management Process Version 2.0 Version Date: 1 May 2017 Revision Date: 2017-05-01 Page 1 of 11 Table of Revisions Revision Number Description of Change Date of Change Reviewed / Revised By 1.0 Formal

More information

Risk Mitigation in a Core Banking Conversion

Risk Mitigation in a Core Banking Conversion Risk Mitigation in a Core Banking Conversion Greg Meidt, chief information officer Chemical Bank Dick Smies, vice president IBS Conversion Services 1 800 822 6758 Introduction A core banking migration

More information

Support Services Policy for Access Education including Success Plans

Support Services Policy for Access Education including Success Plans Support Services Policy for Access Education including Success s v0.4 March 2017 2 Support Success s We recognise that customers have different support requirements and to reflect this you will be offered

More information

Module: Building the Cloud Infrastructure

Module: Building the Cloud Infrastructure Upon completion of this module, you should be able to: Describe the cloud computing reference model Describe the deployment options and solutions for building a cloud infrastructure Describe various factors

More information

IT Managed Services Portfolio

IT Managed Services Portfolio ONE EAM SERVICES IT Managed Services Portfolio SIMPLE. PERSONAL. POWERFUL. About Netsmart Netsmart innovates electronic health records (EHRs), solutions and services that are powerful, intuitive and easy-to-use.

More information

CA Network Automation

CA Network Automation PRODUCT SHEET: CA Network Automation agility made possible CA Network Automation Help reduce risk and improve IT efficiency by automating network configuration and change management. Overview Traditionally,

More information

JOB FAMILY DESCRIPTIONS

JOB FAMILY DESCRIPTIONS JOB FAMILY: APPLICATIONS DEVELOPMENT Director, Systems and Programming Job#: 1200 Responsible for the full systems development life cycle management of projects/programs. Provides direction for technical

More information

WHITE PAPER. CONTROL-M: Empowering the NetWeaver Solution

WHITE PAPER. CONTROL-M: Empowering the NetWeaver Solution WHITE PAPER CONTROL-M: Empowering the NetWeaver Solution e TABLE OF CONTENTS INTODUCTION...3 SAP NETWEAVER OVERVIEW... 3 COMPREHENSIVE TECHNOLOGY - INCREASING THE CHALLENGE... 4 CHALLENGES IN THE NETWEAVER

More information

Service Level Agreement

Service Level Agreement 7 th March 2017 Public - Freely Distributable Version 1.2 Table of Contents 1 Availability SLAs... 3 1.1 How do we Measure Availability?... 4 1.1.1 Permitted Unavailable Time... 4 1.2 How do we Calculate

More information

IBM storage solutions: Evolving to an on demand operating environment

IBM storage solutions: Evolving to an on demand operating environment May 2003 IBM TotalStorage IBM storage solutions: Evolving to an on demand operating environment Page No.1 Contents 1 e-business on demand 1 Integrated information fuels on demand businesses 2 Integrated

More information

VISION MANAGEMENT SOLUTION

VISION MANAGEMENT SOLUTION VISION MANAGEMENT SOLUTION THE MOST ADVANCED MANAGEMENT SOLUTION ON THE MARKET TODAY, FUTURE-PROOFED TO SUPPORT CONTINUOUS GROWTH AND EVOLUTION IN THE RETAIL BANKING ENVIRONMENT. An NCR Buyer s Guide TAKE

More information

MAXIMIZING ROI FROM YOUR EMS: Top FAQs for Service Provider Executives

MAXIMIZING ROI FROM YOUR EMS: Top FAQs for Service Provider Executives MAXIMIZING ROI FROM YOUR EMS: Top FAQs for Service Provider Executives With the Nakina Network OS, it is possible to have a powerful, simple and scalable, carrier-grade solution to discover, secure and

More information

YOUR INNOVATIOn

YOUR INNOVATIOn Dell OEM Industry Solutions YOUR INNOVATIOn Your solution Dell Powered To learn more about how Dell OEM Solutions can help you simplify your OEM operations, visit www.dell.eu/oem or contact our business

More information

The Business Process Environment

The Business Process Environment The Business Process Environment Flexible, Sensible Process Empowerment EMCONEX TECHNICAL BRIEF Richer Systems Group, Inc. February 2012 IDE Script Writer Alert System Audit Tracking Content Manager TABLE

More information

TECHNICAL SUPPORT. and HARDWARE/SOFTWARE/NETWORK MAINTENANCE. for

TECHNICAL SUPPORT. and HARDWARE/SOFTWARE/NETWORK MAINTENANCE. for IT Services Service Level Agreement TECHNICAL SUPPORT and HARDWARE/SOFTWARE/NETWORK MAINTENANCE for UNIVERSITY SERVICES CARD SYSTEM IT Services Service Level Agreement: University Services Card System

More information

SPOK e.notify. Enabling Sophisticated, Efficient Incident Management

SPOK e.notify. Enabling Sophisticated, Efficient Incident Management SM SPOK e.notify Enabling Sophisticated, Efficient Incident Management ENABLING SOPHISTICATED, EFFICIENT INCIDENT MANAGEMENT ARE YOU PREPARED FOR AN EMERGENCY? In an emergency, minutes can be the difference

More information

IBM Tivoli Service Desk

IBM Tivoli Service Desk Deliver high-quality services while helping to control cost IBM Tivoli Service Desk Highlights Streamline incident and problem management processes for more rapid service restoration at an appropriate

More information

Huawei Managed Services Unified Platform (MSUP) Representation of Solution Functionality/Capability. Mapping Technique Employed.

Huawei Managed Services Unified Platform (MSUP) Representation of Solution Functionality/Capability. Mapping Technique Employed. Huawei Managed Services Unified Platform (MSUP) Representation of Solution Functionality/Capability Utilizing etom, ITIL and TL 9000 Huawei Managed Service has integrated these three global standards and

More information

Kaseya Traverse Unified Cloud, Network, Server & Application Monitoring

Kaseya Traverse Unified Cloud, Network, Server & Application Monitoring PRODUCT BRIEF Kaseya Traverse Unified Cloud, Network, Server & Application Monitoring Kaseya Traverse is a next-generation software solution for monitoring the performance of hybrid cloud and IT infrastructure

More information

Myth Busted: Affordable, Easy to manage Virtualization with High-Availability is a Reality

Myth Busted: Affordable, Easy to manage Virtualization with High-Availability is a Reality Myth Busted: Affordable, Easy to manage Virtualization with High-Availability is a Reality EXECUTIVE SUMMARY By Dan Kusnetzky, Distinguished Analyst Business organizations are looking to virtualization

More information

IBM Tivoli Composite Application Manager for Transactions V6.2. helps monitor the availability and response time of business

IBM Tivoli Composite Application Manager for Transactions V6.2. helps monitor the availability and response time of business IBM Europe Announcement ZP08-0167, dated May 13, 2008 IBM Tivoli V6.2 helps monitor the availability and response time of business applications Key prerequisites...2 Description...2 Product positioning...

More information

The Level 3 EIS BSS leverages the applications inherent in the Level 3 commercial Operations Support System (OSS),

The Level 3 EIS BSS leverages the applications inherent in the Level 3 commercial Operations Support System (OSS), General Services Administration (GSA) Enterprise Infrastructure Solutions (EIS) management volume) as soon as possible after The Level 3 EIS BSS leverages the applications inherent in the Level 3 commercial

More information

Collaborative DevOps with Rational and Tivoli

Collaborative DevOps with Rational and Tivoli Collaborative DevOps with Rational and Tivoli Copyright International Business Machines Corporation 2011 IBM Corporation 1 Overview This paper describes the challenges that exist between development and

More information

CHAPTER 9 Electronic Commerce Software

CHAPTER 9 Electronic Commerce Software CHAPTER 9 Electronic Commerce Software 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a. publicly accessible website, in whole or in part, except for use as permitted in

More information

The Smart Service Desk - Three Ways Analytics and Machine Learning Can Transform Your Service Desk

The Smart Service Desk - Three Ways Analytics and Machine Learning Can Transform Your Service Desk White Paper Analytics and Big Data The Smart Service Desk - Three Ways Analytics and Machine Learning Can Transform Your Service Desk Table of Contents page The Challenge of Complexity... 1 A New Approach

More information

PROGNOSIS FOUNDATION FOR BASE24-eps (UNIX)

PROGNOSIS FOUNDATION FOR BASE24-eps (UNIX) TRANSFORMS YOUR PAYMENTS PERSPECTIVE PRODUCT FLYER PROGNOSIS FOUNDATION FOR BASE24-eps (UNIX) REAL-TIME PERFORMANCE AND AVAILABILITY SYSTEM MONITORING CORRELATES DATA BETWEEN HARDWARE AND AUTHORIZATION

More information

Robotic Process Automation

Robotic Process Automation Automate any business process on-the-fly with Robotic Process Automation Paradoxically, IT is the least automated department in many organizations. Robotic Process Automation (RPA) applies specific technologies

More information

Managed Cloud storage. Turning to Storage as a Service for flexibility

Managed Cloud storage. Turning to Storage as a Service for flexibility Managed Cloud storage Turning to Storage as a Service for flexibility Table of contents Encountering problems? 2 Get an answer 2 Check out cloud services 2 Getting started 3 Understand changing costs 4

More information

AMI information. for improved outage management

AMI information. for improved outage management Utilities the way we see it AMI information for improved outage management The information contained in this document is proprietary. 2013 Capgemini. All rights reserved. Rightshore is a trademark belonging

More information

Industrial IoT: From Concept to Business Reality. Improving O&G Operations with IoT. Progressing through the 5 Stages of IoT

Industrial IoT: From Concept to Business Reality. Improving O&G Operations with IoT. Progressing through the 5 Stages of IoT Industrial IoT: From Concept to Business Reality 01 Improving O&G Operations with IoT 02 Establishing Goals for IoT 03 Progressing through the 5 Stages of IoT 04 An Illustrated Example 01 Improving O&G

More information

Regional Integrated Multi-Modal Information Sharing (RIMIS) System Project Concept of Operations Executive Summary

Regional Integrated Multi-Modal Information Sharing (RIMIS) System Project Concept of Operations Executive Summary Regional Integrated Multi-Modal Information Sharing (RIMIS) System Project Concept of Operations Executive Summary 190 North Independence Mall West Philadelphia, Pennsylvania EXECUTIVE SUMMARY Background

More information

Proposed Service Level Agreement For Medium SaaS Projects

Proposed Service Level Agreement For Medium SaaS Projects Proposed Service Level Agreement For Medium SaaS Projects THIS ON-LINE SERVICES AGREEMENT (this Agreement ) shall commence on June 15, 2012, or upon execution of this Agreement, whichever date is later,

More information

Savvius and Splunk: Network Insights for Operational Intelligence

Savvius and Splunk: Network Insights for Operational Intelligence TM Savvius and Splunk: Network Insights for WHITE PAPER Just as networks are critical to all kinds of operations, so network data is critical to all kinds of operational analysis. Savvius provides industry-leading

More information

IBM Business Automation Content Services on Cloud

IBM Business Automation Content Services on Cloud Service Description IBM Business Automation Content Services on Cloud This Service Description describes the Cloud Service IBM provides to Client. Client means the contracting party and its authorized

More information

Security Monitoring Service Description

Security Monitoring Service Description Security Monitoring Service Description Contents Section 1: UnderdefenseSOC Security Monitoring Service Overview 3 Section 2: Key Components of the Service 4 Section 3: Onboarding Process 5 Section 4:

More information

AXON PREDICT ANALYTICS FOR VXWORKS

AXON PREDICT ANALYTICS FOR VXWORKS AN INTEL COMPANY AXON PREDICT ANALYTICS FOR VXWORKS Real-Time Advanced Visual Edge Analytics Integrated with the VxWorks Real-Time Operating System Data. It is doubling in size every two years, and by

More information

Actionable Information Instantly Delivered

Actionable Information Instantly Delivered ALARMPOINT SOLUTIONS BRIEF Actionable Information Instantly Delivered Building on the ITIL FOundation Increasing Application Availability Service Delivery - Optimizing Operations Alarm utilizes CMDB asset

More information

SERVICE DESCRIPTION DISASTER RECOVERY AS A SERVICE

SERVICE DESCRIPTION DISASTER RECOVERY AS A SERVICE Contents Service Overview.... 3 Key Features... 4 Implementation... 4 Validation... 4 Implementation Process.... 5 Internal Kick-Off... 5 Customer Kick-Off... 5 Provisioning & Testing.... 5 Billing....

More information

IBM Data Security Services for activity compliance monitoring and reporting log analysis management

IBM Data Security Services for activity compliance monitoring and reporting log analysis management Improving your compliance posture and reducing risk through log analysis management IBM Data Security Services for activity compliance monitoring and reporting log analysis management Highlights Provide

More information

ITILSC-OSA Exam. ITILSC-OSA. ITIL Service Capability Operational Support and Analysis Exam. Version 1.0

ITILSC-OSA Exam.   ITILSC-OSA. ITIL Service Capability Operational Support and Analysis Exam. Version 1.0 ITILSC-OSA Exam Number: ITILSC-OSA Passing Score: 800 Time Limit: 120 min File Version: 1.0 ITILSC-OSA ITIL Service Capability Operational Support and Analysis Exam Version 1.0 Exam A QUESTION 1 Which

More information

QUALIFICATIONS PACK - OCCUPATIONAL STANDARDS FOR TELECOM INDUSTRY. SECTOR: TELECOM SUB-SECTOR: Network Managed Services

QUALIFICATIONS PACK - OCCUPATIONAL STANDARDS FOR TELECOM INDUSTRY. SECTOR: TELECOM SUB-SECTOR: Network Managed Services QUALIFICATIONS PACK - OCCUPATIONAL STANDARDS FOR TELECOM INDUSTRY W h a t a r e O c c u p a t i o n a l S t a n d a r d s ( O S )? OS describe what individuals need to do, know and understand in order

More information

Enterprise SM VOLUME 2, SECTION 2.6: TROUBLE AND COMPLAINT HANDLING

Enterprise SM VOLUME 2, SECTION 2.6: TROUBLE AND COMPLAINT HANDLING VOLUME 2, SECTION 2.6: TROUBLE AND COMPLAINT HANDLING 2.6 TROUBLE AND COMPLAINT HANDLING [C.3.4.2, M.3.7] 2.6.1 TROUBLE AND COMPLAINT ORGANIZATION AND RESOURCES [L.34.2.3.6] The Level 3 Team provides a

More information

Enabling a Comprehensive Platform for BCMP that integrates People, Process and Technology

Enabling a Comprehensive Platform for BCMP that integrates People, Process and Technology Enabling a Comprehensive Platform for BCMP that integrates People, Process and Technology TM Overview Perpetuuiti provides an intelligent, end-to-end automated approach towards Business Continuity Planning

More information

Frameworx 13.0 Product Conformance Certification Report

Frameworx 13.0 Product Conformance Certification Report Frameworx 13.0 Product Conformance Certification Report Aggaros STICK&PLAY Version 3 Satuna March 2014 Version 1.0 Table of Contents List of Figures... 4 List of Tables... 5 1 Introduction... 6 1.1 Executive

More information

SapphireIMS 4.0 Business Service Monitoring Feature Specification

SapphireIMS 4.0 Business Service Monitoring Feature Specification SapphireIMS 4.0 Business Service Monitoring Feature Specification Overview The purpose of Business Service Monitoring is to provide processes and methodologies to the organization to create quantifiable

More information

intelligent Automation Redefined with Av3ar V2.0 Your intelligent Digital Workforce Perpetuuiti. All rights reserved

intelligent Automation Redefined with Av3ar V2.0 Your intelligent Digital Workforce Perpetuuiti. All rights reserved intelligent Automation Redefined with Av3ar Your intelligent Digital Workforce V2.0 C 2017 Perpetuuiti. All rights reserved www.ptechnosoft.com Artificial Intelligence is for a larger impact... With the

More information

Services. Dell ProSupport TM. Improve productivity and optimize resources with efficient, flexible, and reliable support

Services. Dell ProSupport TM. Improve productivity and optimize resources with efficient, flexible, and reliable support Services Dell ProSupport TM Improve productivity and optimize resources with efficient, flexible, and reliable support Shift your resources from maintenance to momentum. Dell s award-winning support can

More information

TM Forum Portfolio and Product Management Quick Start Pack: Trouble to Resolve February 2012 TM Forum Approved Version 0.4

TM Forum Portfolio and Product Management Quick Start Pack: Trouble to Resolve February 2012 TM Forum Approved Version 0.4 TM Forum Portfolio and Product Management Quick Start Pack: Trouble to Resolve February 2012 TM Forum Approved Version 0.4 Notice This material, including documents, code and models, has been through review

More information

Applying machine intelligence to network management

Applying machine intelligence to network management ericsson.com/ mobility-report Applying machine intelligence to network management Extract from the Ericsson Mobility Report June 2018 2 Articles Applying machine intelligence to network management Advances

More information

ConvergeOne. The Value of Nectar s UCD in Cisco Contact Center Environments. Unified Communications Diagnostics Module USE CASE

ConvergeOne. The Value of Nectar s UCD in Cisco Contact Center Environments. Unified Communications Diagnostics Module USE CASE The Value of Nectar s UCD in Cisco Contact Center Environments ConvergeOne Unified Communications Diagnostics Module Advanced Monitoring, Management, Reporting, and Diagnostics As organizations face new

More information

How to Choose a Managed Services Provider

How to Choose a Managed Services Provider How to Choose a Managed Services Provider Finding Peace of Mind If you re outsourcing your IT services, you need to find a Managed Services Provider you can trust. A technology partner with the experience,

More information

Get Proactive With Oracle Support. Denis Jaume Senior Director Software Support

Get Proactive With Oracle Support. Denis Jaume Senior Director Software Support Get Proactive With Oracle Support Denis Jaume Senior Director Software Support Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes

More information

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application Technical Overview Goliath Technologies gives us complete visibility into the end user experience from the time they log

More information

DEPARTMENT OF INFORMATION SERVICES, OFFICE OF TECHNOLOGY SERVICES

DEPARTMENT OF INFORMATION SERVICES, OFFICE OF TECHNOLOGY SERVICES PROCEDURES AND GUIDELINES The Arlington Public Schools recognizes the need to provide administrative staff members with training and support in the use of the various technologies that are available to

More information

Requirements Specification

Requirements Specification Ambulance Dispatch System Submitted to: Dr. Chung Submitted by: Chris Rohleder, Jamie Smith, and Jeff Dix Date Submitted: February 14, 2006 TABLE OF CONTENTS 1.0 INTRODUCTION...1 1.1 PURPOSE...1 1.2 SCOPE...1

More information

Private Cloud. Service Description

Private Cloud. Service Description Introduction... 2 Service Options (Scope)... 2 Service Operations... 2 Implementation Plan and Timeline... 2 Service Support... 3 Customer Operations... 3 Network Operations... 3 Availability... 4 Business

More information

MaaS360 Technical Support Guide

MaaS360 Technical Support Guide MaaS360 Technical Support Guide Table of Contents Welcome... 2 Purpose of this document... 2 Roles and responsibilities... 3 Getting started with MaaS360 Technical Support... 4 Overview... 4 Three contacts...

More information

Service Manager Simplifying modern ITSM

Service Manager Simplifying modern ITSM www.hornbill.com Service Manager Simplifying modern ITSM Hornbill Service Manager helps organizations to modernize IT Service Management by adopting Collaborative ITSM. This fresh approach blends the most

More information

Nectar Converged Management Platform

Nectar Converged Management Platform Nectar Converged Management Platform The Most Advanced, End-to-End Management, Monitoring and Diagnostic Tools for Unified Communications Networks Solution Brochure With instant messaging, IP telephony,

More information

Carahsoft End-User Computing Solutions Services

Carahsoft End-User Computing Solutions Services Carahsoft End-User Computing Solutions Services Service Description Horizon View Managed Services Gold Package Managed Services Packages Options # of Desktops to be Managed Desktop Type Duration of Services

More information

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application Technical Overview Goliath Technologies gives us complete visibility into the end user experience from the time they log

More information