OPMANTEK NETWORK MANAGEMENT AND IT AUDIT SOFTWARE. Developing a Strategic NOC Service using Opmantek s Commercial Open-Source Solutions, v2 April 2018

Size: px

Start display at page:

Download "OPMANTEK NETWORK MANAGEMENT AND IT AUDIT SOFTWARE. Developing a Strategic NOC Service using Opmantek s Commercial Open-Source Solutions, v2 April 2018"

Maximilian Holmes
5 years ago
Views:

Housekeeping Attendees will be on mute during the presentation to prevent

If you wish to ask a question please ask via GoToWebinar s chat We will have a

1 OPMANTEK NETWORK MANAGEMENT AND IT AUDIT SOFTWARE Developing a Strategic NOC Service using Opmantek s Commercial Open-Source Solutions, v2 April 2018 Housekeeping Attendees will be on mute during the presentation to prevent interruptions from feedback and background noise. If you wish to ask a question please ask via GoToWebinar s chat We will have a Q&A session at the end and have allowed lots of time. This session will be recorded and made available to all attendees 1

Topics for Today Differences Between a Traditional NOC Model and a Strategic NOC

Client On-Boarding and Scalability Identify Your Fully Loaded Costs in Deploying

Ad Hoc Undocumented Unpredictable Multiple help desks Minimal IT operations User

software distribution Initiate problem management process Alert and event

trends Set thresholds Predict problems Measure application availability Automate

processes Operational Process Engineering Level 3 Increasing Performance & Value

provider IT and business metric Define services, classes, linkage pricing

SLAs Real-time infrastructure Measure and report Business planning service

2 Topics for Today Differences Between a Traditional NOC Model and a Strategic NOC Developing a Service Catalog with Measurable SLAs Architecting a Solution for Fast Client On-Boarding and Scalability Identify Your Fully Loaded Costs in Deploying and Growing the Strategic NOC IT Service Management Maturity Model Level 0 CHAOTIC Ad Hoc Undocumented Unpredictable Multiple help desks Minimal IT operations User call notification Tool Leverage Level 1 REACTIVE Fight fires Inventory Desktop software distribution Initiate problem management process Alert and event management Measure component availability (up/down) Level 2 PROACTIVE Analyze trends Set thresholds Predict problems Measure application availability Automate Mature problem configuration, change, asset and performance mgmt. processes Operational Process Engineering Level 3 Increasing Performance & Value to Organization VALUE IT as a strategic business SERVICES partner IT as a service provider IT and business metric Define services, classes, linkage pricing IT/business collaboration Understand costs improves business process Guarantee SLAs Real-time infrastructure Measure and report Business planning service availability Integrate processes Capacity Mgmt. Service Delivery Process Engineering Level 4 Service & Account Management Manage IT as a Business 2

3 DIFFERENCES BETWEEN A TRADITIONAL NOC MODEL AND A STRATEGIC NOC Traditional NOC Model What the Traditional NOC model has to offer NOC is embedded with larger network/server/application support process May include maintenance functions (i.e. remote desktop, patching, anti-virus, etc.) Staff time split between fault resolution and routine equipment maintenance including equipment refresh projects Monitoring focuses on equipment state (often siloed between network & server) Fault response is primarily reactionary, with no or little automation 3

4 Strategic NOC The strategic NOC model is about improving collaboration and response Understand Customers/Lines of Business care about user satisfaction, not equipment state Focus is on monitoring application performance and user experience; ensuring end-toend quality of the network Offers a clearly defined list of services offered and SLAs, marries these to pricing/value Initial fault response is automated Self Service is a key component at all levels NOC is actively involved with DevOps, Application Development, Test/QA, & Deployment NOC maintains proactive communication channels with Customers/Lines of Business Benefits Why Invest in Converting to a Strategic NOC Model? Reduced time spent on routine responses Improved reaction time to UX impacting faults Reduced time to fault resolution, and Ability to predict outages and degradation 4

5 DEVELOPING A SERVICE CATALOG WITH MEASURABLE SLAS Service Catalog What Services and SLAs will Your Strategic NOC Offer? Application Monitoring (suggest at least 2, but recommend 3-tier system) Bronze; monitors underlying equipment and services required for application Silver; Bronze plus custom automated response to defined faults Gold; Silver plus synthetic transactions and deployed UX monitors Performance Trending Trend underlying equipment to understand UX impact and investment needs Self Service Offer custom dashboards so Customer/LOB can see application performance in near real time 5

6 Service Level Agreements What Can Your NOC Team Support? SLA Downtime SLA Minutes/Month Hours/Year 97.00% 1, % % % % How much down-time can your Customer/LOB absorb each day/week/month/quarter? How does network/application downtime affect business income? Fault/problem response time; usually to first touch not resolution Hours/Days of operation; use of on-call technicians will impact Response Times ARCHITECTING A SOLUTION FOR FAST CLIENT ONBOARDING AND SCALABILITY 6

Open-Source NMIS: Core performance and fault monitoring Architecting a Solution Commercial Solutions OAE: Scheduled discovery and auditing opha: Supports horizontal and vertical scalability

7 Open-Source NMIS: Core performance and fault monitoring Architecting a Solution Commercial Solutions OAE: Scheduled discovery and auditing opha: Supports horizontal and vertical scalability opcharts: Customer Portal w/customized dashboards opevents: Automated event response optrend: Predictive trend analytics Equipment Sizing How Much Equipment Is Needed to Support the Services and SLA? NMIS Performance and Fault Monitoring Individual server w/6-8vcpu and 16-24GB RAM can support 3-5k devices Total devices served by an individual server depends on total number of interfaces collected, device latency and response time, and performance of storage More devices than an individual NMIS server can support? Add opha to support horizontal and vertical scaling 7

Example Scaled Server Architecture Master Master01 Master Master02 ( ) Master Portal01 ( ) Optionally, Additional servers can be added to service new NOCs

Opmantek Application Flow Master opreports opha metadata NMIS metadata opcharts opevents reports metadata detail-link meta-events Poller metadata metadata

8 Example Scaled Server Architecture Master Master01 Master Master02 ( ) Master Portal01 ( ) Optionally, Additional servers can be added to service new NOCs if latency is > 100ms Additional Polling servers are added as needed for customer or geographic expansion Slave Poller01 Slave Poller02 Slave Poller03( ) Opmantek Application Flow Master opreports opha metadata NMIS metadata opcharts opevents reports metadata detail-link meta-events Poller metadata metadata opflow Collector opreports opha summary api opcharts NMIS opevents opconfig events opflow service monitor SNMP / WMI trap syslog cli data Netflow Data Subnet 8

opcharts Customer Portal with Customized Dashboards WHY self service dashboards reduce client interruptions while providing the client with the feeling of control and transparency; for billable

scripted Custom Dashboards, Maps, Charts and Business Services are assigned to that user User can only see the elements you give them access to optrend Dynamic trending replaces static thresholds

9 opcharts Customer Portal with Customized Dashboards WHY self service dashboards reduce client interruptions while providing the client with the feeling of control and transparency; for billable clients it can be an up-sell or a service differentiator An implementation of opcharts is exposed to the internet via a reverse proxy Client accounts are created within opcharts, this can be scripted Custom Dashboards, Maps, Charts and Business Services are assigned to that user User can only see the elements you give them access to optrend Dynamic trending replaces static thresholds for alerting WHY Equipment works differently in the real world than in the vendor s best-case lab. By understanding what s normal for each device optrend replaces static thresholds with what s normal 9

10 Client On-Boarding and Scaling Reduce Friction, Automate Where Possible, Document Everywhere How will you onboard new clients? Are new clients added to existing polling servers, or will new servers be provisioned? How will new devices and services be added to polling servers? Will the service be charged back to the client? For each client identify What applications are key to their LOB? Services and SLA per Application? What type(s) of synthetic transactions will the applications support? IDENTIFYINGYOURFULLYLOADED COSTS 10

11 Fully Loaded Cost Tracking Value to the Business Starts With Understanding Your Costs Employee Fully Loaded Cost rate is generally 1.49 (SAP) x annual salary 1.25x employment taxes and benefits 1.75x office space, equipment 1.25x related management expenses and non-billable work Salary of Network Engineer II/III in Charlotte, NC is $89,287 Annual Fully Loaded Cost = $133,037 - $208,932 Staffing a 24/7 NOC Tracking Value to the Business Starts With Understanding Your Costs A single engineer can effectively support 8-12k devices/shift Assumes properly configured and load balanced implementation of NMIS, opcharts, opevents, opconfig, and optrend as well as appropriate user/admin training Assumes automation to add/update/retire devices Assumes at least Silver Service level; monitors underlying equipment and services required for application, custom automated response to defined faults Assumes normal system operation Minimum staffing = 2r/shift = 9 FTE = $1,197,333 - $1,880,388 / year fully loaded 11

average 68% This reduces downtime from 262.8 to ~84hrs, increasing SLA to 99.

12 Return On Investment (ROI) An average Enterprise class business experiences 262.8hrs of system downtime/year This equates to an operating SLA of 97%, well below most operational expectations Opmantek s solutions have shown to reduce downtime by an average 68% This reduces downtime from to ~84hrs, increasing SLA to 99.0% For a $150MM business this creates savings in revenue and productivity of $2MM This equates to an ROI of 93% Investment generally pays for itself in < 10 months CONTACT FOR FOLLOW UP Commercial enquiries: Tom Wiri Account Executive +1 (512) usa@opmantek.com Technical enquiries: Mark Henry Senior Engineer +1 (207) markh@opmantek.com 12